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META SEARCH ENGINE 



5 BACKGROUND OF THE INVENTION 

This application is a conversion of copending 
provisional application 60/062,958, filed October 10, 
1997. 

A number of useful and popular search engines 

10 attempt to maintain full text indexes of the World Wide 

Web. For example, search engines are available from 
AltaVista, Excite, HotBot, Infoseek, Lycos and Northern 
Light. However, searching the Web can still be a slow 
and tedious process. Limitations of the search services 

15 have led to the introduction of meta search engines. A 

meta search engine searches the Web by making requests to 
multiple search engines such as AltaVista or Infoseek. 
The primary advantages of current meta search engines are 
the ability to combine the results of multiple search 

20 engines and the ability to provide a consistent user 

interface for searching these engines . Experimental 
results show that the major search engines index a 
relatively small amount of the Web and that combining the 
results of multiple engines can therefore return many 

25 documents that would otherwise not be found. 

A number of meta search engines are currently 
available. Some of the most popular ones are 
MetaCrawler, Inference Find, SawySearch, Fusion, 
ProFusion, Highway 61, Mamma, Quarterdeck WebCompass, 

30 Symantec Internet FastFind, and ForeFront WebSeeker. 



The principle motivation behind the basic text 
meta search capabilities of the meta search engine of 
this invention was the poor precision, limited coverage, 
limited availability, limited user interfaces, and out of 
date databases of the major Web search engines. More 
specifically, the diverse nature of the Web and the focus 
of the Web search engines on handling relatively simple 
queries very quickly leads to search results often having 
poor precision. Additionally, the practice of "search 
engine spamming" has become popular, whereby users add 
possibly unrelated keywords to their pages in order to 
alter the ranking of their pages. The relevance of a 
particular hit is often obvious only after waiting for 
the page to load and finding the query term(s) in the 
page. 

Experience with using different search engines 
suggests that the coverage of the individual engines was 
relatively low, i.e. searching with a second engine would 
often return several documents which were not returned by 
the first engine. It has been suggested that AltaVista 
limits the number of pages indexed per domain, and that 
each search engine has a different strategy for selecting 
pages to index. Experimental results confirm that the 
coverage of any one search engine is very limited. 

In addition, due to search engine and/or 
network difficulties, the engine which responds the 
quickest varies over time. It is possible to add a 
number of features which enhance usability of the search 
engines. Centralized search engine databases are always 
out of date. There is a time lag between the time when 



new information is made available and the time that it is 
indexed. 



SUMMARY OF THE INVENTION 

5 

An object of this invention is to improve meta 
search engines . 

Another object of the present invention is to 
provide a meta search engine that analyzes each document 
10 and displays local context around the query terms. 

A further object of this invention is to 
provide a search method that improves on the efficiency 
of existing search methods. 

A further object of this invention is to 
15 provide a meta search engine that is capable of 

displaying the context of the query terms, advanced 
duplicate detection, progressive display of results, 
highlighting query terms in the pages when viewed, 
insertion of quick jump links for finding the query terms 
20 in large pages, dramatically improved precision for 

certain queries by using specific expressive forms, 
improved relevancy ranking, improved clustering, and 
image search. 

These and other objectives are attained with a 
25 computer implemented meta search engine and search 

method. In accordance with this method, a query is 
forwarded to a number of third party search engines, and 
the responses from the third party search engines are 
parsed in order to extract information regarding the 
30 documents matching the query. The full text of the 
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documents matching the query are downloaded, and the 
query terms in the documents are located. The text 
surrounding the query terms are extracted, and that text 
is displayed. 

5 The engine downloads the actual pages 

corresponding to the hits and searches them for the query 
terms. The engine then provides the context in which the 
query terms appear rather than a summary of the page 
(none of the available search engines or meta search 
10 services currently provide this option) . This typically 

provides a much better indication of the relevance of a 
page than the summaries or abstracts used by other search 
engines, and it often helps to avoid looking at a page 
only to find that it does not contain the required 
15 information. The context can be particularly helpful 

whenever a search includes terms which may occur in a 
different context to that required. The amount of 
context is specified by the user in terms of the number 
of characters either side of the query terms. Most non- 
20 alphanumeric characters are filtered from the context in 
order to produce more readable and informative results. 

Results are returned progressively after each 
individual page is downloaded and analyzed, rather than 
after all pages are downloaded. The first result is 
25 typically displayed faster than the average time for a 

search engine to respond. When multiple pages provide 
the information required, the architecture of the meta 
engine can be helpful because the fastest sites are the 
first ones to be analyzed and displayed. 
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When viewing the full pages corresponding to 
the hits, these pages are filtered to highlight the query 
terms and links are inserted at the top of the page which 
jump to the first occurrence of each query term. Links 
5 at each occurrence of the query terms jump to the next 

occurrence of the respective term. Query term 
highlighting helps to identify the query terms and page 
relevance quickly. The links help to find the query 
terms quickly in large documents. 

10 Pages which are no longer available can be 

identified. These pages are listed at the end of the 
response. Some other meta search services also provide 
Mead link" detection, however the feature is usually 
turned off by default and no results are returned until 

15 all pages are checked. For the meta search engine of 

this invention however, the feature is intrinsic to the 
architecture of the engine which is able to produce 
results both incrementally and quickly. 

20 Pages which no longer contain the search terms 

or that do not properly match the query can be 
identified. These pages are listed after pages which 
properly match the query. This can be very important - 
different engines use different relevance techniques, and 

25 if just one engine returns poor relevance results, this 

can lead to poor results from standard meta search 
techniques . 

The tedious process of requesting additional 
hits can be avoided. The meta search engine understands 
30 how to extract the URL for requesting the next page of 
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hits from the individual search engine responses. More 
advanced detection of duplicate pages is done. Pages are 
considered duplicates if the relevant context strings are 
identical. This allows the detection of a duplicate if 

5 the page has a different header or footer. 

U.S. Patent 5,659,732 (Kirsch) presents a 
technique for relevance ranking with meta search 
techniques wherein the underlying search engines are 
modified to return extra information such as the number 

10 of occurrences of each search term in the documents and 

the number of occurrences in the entire database. Such a 
technique is not required for the meta search engine of 
this invention because the actual pages are downloaded 
and analyzed. It is therefore possible to apply a 

15 uniform ranking measure to documents returned by 

different engines. Currently, the engine displays pages 
in descending order of the number of query terms present 
in the document (if none of the first few pages contain 
all of the query terms, then the engine initially 

20 displays results which contain the maximum number of 

query terms found in a page so far) . After all pages 
have been downloaded, the engine then relists the pages 
according to a simple relevance measure. 

This measure currently considers the number of 

25 query terms present in the document, the proximity 

between query terms, and term frequency (the usual 
inverse document frequency may also be useful (Salton, G. 
(1989) , Automatic text processing: the transformation, 
analysis and retrieval of information by computer, 

30 Addison-Wesley . ) 
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R = c l N p + (r 2 E ^ '^^i^^^i).^)^ C2 M 

5 

where N p is the number of query terms that are present in 
the document (each term is counted only once) , N t is the 

10 total number of query terms in the document, d(i,j) is 

the minimum distance between the ith and the jth of the 
query terms which are present in the document (currently 
in terms of the number of characters) , c x is a constant 
which controls the overall magnitude of the relevance 

15 measure R, c 2 is a constant specifying the maximum 

distance between query terms which is considered useful, 
and c 3 is a constant specifying the importance of term 
frequency (currently c x = 100, c 2 = 5000, and c 3 = 10c x ) . 
This measure is used for pages containing more than one 

20 of the query terms; when only one query term is found the 

term's distance from the start of the page is used. 

This ranking criterion is particularly useful 
with Web searches. A query for multiple terms on the Web 
often returns documents which contain all terms, but the 

25 terms are far apart in the document and may be in 

unrelated sections of the page, e.g. in separate Usenet 
messages archived on a single Web page, or in separate 
bookmarks on a page containing a list of bookmarks. 

The engine does not use the lowest common 

30 denominator in terms of the search syntax. The engine 
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supports all common search formats, including boolean 
syntax. Queries are dynamically modified in order to 
match each individual query syntax. The engine is 
capable of tracking the results of queries, automatically 
5 informing users when new documents are found which match 

a given query. The engine is capable of tracking the 
text of a given page, automatically informing the user 
when the text changes and which lines have changed. The 
engine includes an advanced clustering technique which 

10 improves over the clustering done in existing search 

engines. A specific expressive forms search technique 
can dramatically improve precision for certain queries. 
A new query expansion technique can automatically perform 
intelligent query expansion. 

15 Additional features which could easily be added 

to the meta search engine of this invention include: 
Improved relevance measures, Alternative ordering 
methods, e.g. by site, Field searching e.g. page title, 
Usenet message subject, hyperlink text, Rules and/or 

20 learning methods for routing queries to specific search 

engines, Word sense disambiguation, and Relevance 
feedback . 

Further benefits and advantages of the 
invention will become apparent from a consideration of 
25 the following detailed description, given with reference 

to the accompanying drawings, which specify and show 
preferred embodiments of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 
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Figure 1 shows the home page of the meta search 
engine of this invention. 

Figure 2 shows the options page of the meta 
search engine of this invention. 
5 Figures 3-8 show, respectively, first through 

sixth portions of a sample response of the meta search 
engine of the present invention for the query nec and 
u digital watermark." 

Figure 9 shows a sample page view for the meta 
10 search engine of this invention. 

Figure 10 is a simplified control flow chart of 
the meta search engine of the present invention. 

Figure 11 is a simplified control flow chart 
for image meta search. 
15 Figure 12 shows a first portion of a sample 

response of the meta search engine of this invention for 
the query koala in the image databases, filtered for 
photos . 

Figure 13 shows a second portion of a sample 
20 response of the meta search engine of this invention for 

the query koala in the image databases, filtered for 
photos . 

Figure 14 shows a sample response of the meta 
search engine of this invention for the query koala in 
25 the image databases, filtered for graphics. 

Figure 15 shows clusters for the query u joydeep 

ghosh . " 

Figure 16 shows the first two cluster summaries 
for the query "joydeep ghosh." 
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Figure 17 shows the first part of the clusters 
for the query "joydeep ghosh" from HuskySearch. 

Figure 18 shows the second part of the clusters 
for the query "joydeep ghosh" from HuskySearch. 
5 Figure 19 shows clusters for the query "joydeep 

ghosh" from AltaVista. 

Figure 2 0 shows clusters produced by the meta 
search engine of this invention for the query "neural 
network. " 

10 Figure 21 shows clusters produced by the meta 

search engine of this invention for the query typing and 
injury along with the first cluster summary. 

Figure 22 shows the response of the meta search 
engine of the present invention for the query What does 
15 NASDAQ stand for? 

Figure 23 shows the response of Infoseek for 
the query What does NASDAQ stand for? 

Figure 24 shows the response of the meta search 
engine of this invention for the query How is a rainbow 
20 created? 

Figure 2 5 shows the response of Infoseek for 
the query How is a rainbow created? 

Figure 26 shows the response of the meta search 
engine of the present invention for the query What is a 

25 mealy machine? 

Figure 2 7 shows a sample home page showing new 
hits for a query and recently modified URLs. 

Figure 28 shows a sample page view showing the 
text which has been added to the page since the last time 
30 it was viewed. 
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Figure 29 shows the coverage of each of six 
search engines with respect to the combined coverage of 
all six. 

Figure 30 shows the coverage as the number of 
search engines is increased. 

Figure 31 shows a comparison of the overlap 
between search engines to the number of documents 
returned from all six engines combined. 

Figure 32 shows the coverage of each search 
engine with respect to the estimated size of the 

indexable Web. 

Figures 33 and 34 show histograms of the major 
search engine response times, and a histogram of the 
response time for the first response when queries are 
made to the six engines simultaneously. 

Figure 3 5 shows the median time for the Web 
search engines to respond. 

Figure 36 shows the median time for the first 
of n Web search engines to respond. 

Figure 37 shows the response time for arbitrary 

Web pages. 

Figure 3 8 shows the median time to download the 
first of n pages requested simultaneously. 

Figure 39 shows the time for the met a engine to 
display the first result. 

DETAILED DESCRIPTION OF THE PREF ERRED EMBODIMENTS 

One of the fundamental features of the meta 
search engine of this invention is that it analyzes each 
document and displays local context around the query 



terms. The benefit of displaying the local context, 
rather than an abstract or summary of the document, is 
that the user may be able to more readily determine if 
the document answers his or her specific query. In 
5 essence, this technique admits that the computer may not 

be able to accurately determine the relevance of a 
particular document, and in lieu of this ability, formats 
the information in the best way for the user to quickly 
determine relevance. A user can therefore find documents 
10 of high relevance by quickly scanning the local context 

of the query terms. This technique is simple, but can be 
very effective, especially in the case of Web search 
where the database is very large, diverse, and poorly 
organized. 

15 The idea of querying and collating results from 

multiple databases is not new. Companies like PLS, 
Lexis -Nexis, and Verity have long since created systems 
which integrate the results of multiple heterogeneous 
databases. Many other Web meta search services exist 

20 such as the popular and useful MetaCrawler service. 

Services similar to MetaCrawler include SawySearch, 
Inference Find, Fusion, ProFusion, Highway 61, Mamma, 
Quarterdeck WebCompass, Metabot, Symantec Internet 
Fast Find, and WebSeeker. 

25 Figure 1 shows the home page of the meta search 

engine of this invention. The bar 12 at the top contains 
links for the options page, the help page, and the 
submission of suggestions and problems. Queries are 
entered into the "Find:" box 14. The selection of which 

30 search engines to use for the query is made by clicking 
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on the appropriate selection on the following line. The 
options are currently: 

1. Web - standard Web search engines: (a) 
AltaVista, (b) Excite, (c) Infoseek, (d) HotBot, (e) 

5 Lycos, (f) Northern Light, (g) WebCrawler, and (h) Yahoo. 

2 . Usenet Databases - indexes of Usenet 
newsgroups: (a) AltaVista, (b) DejaNews, (c) 
Reference . com . 

3 . Press - indexes of press articles and news 
10 wires: (a) Infoseek NewsWire, Industry, and Premier 

sources - c/o Infoseek - Reuters, PR NewsWire etc., and 
(b) NewsTracker - c/o Excite - online newspapers and 
magazines . 

4. Images - image indexes: (a) Corel - corel 
15 image database, (b) HotBot - HotBot images, (c) Lycos - 

Lycos images, (d) WebSeer - WebSeer images, (e) Yahoo - 
Yahoo images, and (f) AltaVista - AltaVista images. 

5. Journals - academic journals: (a) Science. 

6. Tech - technical news: (a) TechWeb and (b) 

20 ZDNet . 

7. All - all of the above. 

The constraints menu 16 follows which contains 
options for constraining the results to specific domains, 
specific page ages, and specific image types. The main 
25 options menu 2 0 follows which contains options for 

selecting the maximum number of results, the amount of 
context to display around the query terms (in 
characters) , and whether or not to activate clustering or 
tracking. 
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The options link on the top bar allows setting 
a number of other options, as shown at 22 in figure 2. 
These options are: 1. The timeout (per individual page 
download), 2. Whether or not to filter the pages when 

5 viewed, 3. Whether or not to filter images from the pages 

when viewed, 4. Whether each search displays results in a 
new window or not, and 5. Whether or not to perform image 
classification (for manual classification of images) . 
Additionally, the options page shows at 24 and 2 6 which 

10 queries and URLs are being tracked for changes, and 

allows entering a new URL to track. 

Figures 3 to 8 show a sample response of the 
meta search engine of this invention for the query nec 
and "digital watermark". Figure 3 shows the top portion 

15 of the response from the search. The search form can be 

seen at the top, followed by a tip 30 which may be query 
sensitive. Results which contain all of the query terms 
are then displayed as they are retrieved and analyzed (as 
mentioned before, if none of the first few pages contain 

20 all of the query terms then the engine initially displays 

results which contain the maximum number of query terms 
found in a page so far) . The bars 32 to the left of the 
document titles indicate how close the query terms are in 
the documents - longer bars indicate that the query terms 

25 are closer together. The engine which found the 

document, the age of the document, the size of the 
document, and the URL follow the document title. 

After the pages have been retrieved, the engine 
then displays the top 2 0 pages ranked using term 

30 proximity information (figure 4) . In descending order, 
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and referring to figures 5 to 8, the engine then displays 
those pages which contain fewer query terms, those pages 
which contain none of the query terms, those pages which 
contain duplicate context strings, and those pages which 

5 could not be downloaded. Links to the search engine 

pages which were used are then provided, followed by 
terms which may be useful for query expansion. With 
reference to figure 8, the engine then displays a summary 
box with information on the number of documents found 

10 from each individual engine, the number retrieved and 

processed, and the number of duplicates. 

Figure 9 shows a sample of how the individual 
pages are processed when viewed. The links 40 at the top 
jump to the first occurrence of the query terms in the 

15 document, and indicate the number of occurrences. The 

[Track Page] link activates tracking for this page - the 
user will be informed when and how the document changes. 

The engine comprises two main logical parts: 
the meta search code and a parallel page retrieval 

20 daemon. Pseudocode for (a simplified version of) the 

search code is as follows: 

Process the request to check syntax and create . . 
..regular expressions which are used to match query.. 
. .terms 

25 Send requests (modified appropriately) to all.. 

..relevant search engines 
Loop for each page retrieved until maximum number.. 

..of results or all pages retrieved 
If page is from a search engine 
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Parse search engine response extracting hits.. 

..and any link for the next set of results 
Send requests for all of the hits 
Send requests for the next set of results.. 
. . if applicable 

Else 

Check page for query terms and create . . 
..context strings if found 

Print page information and context strings if all.. 

..query terms are found and duplicate context.. 

..strings have not been encountered before 
Endif 
End loop 

Re -rank pages using proximity and term frequency. . 
. . information 

Print page information and context strings for pages.. 

..which contained some but not all query terms 
Print page information for pages which contained no.. 

. .query terms 

Print page information and context strings for pages.. 

..which contain duplicate context strings 
Print page information for pages which could not be.. 

. .downloaded 
Print summary statistics. 

Figure 10 shows a simplified control flow 
diagram 50 of the meta search engine. The page retrieval 
engine is relatively simple but does incorporate features 
such as queuing requests and balancing the load from 
multiple search processes, and delaying requests to the 
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same site to prevent overloading a site. The page 
retrieval engine comprises a dispatch daemon and a number 
of client retrieval processes. Pseudocode for (a 
simplified version of) the dispatch daemon is as follows: 
5 Start clients 

Loop 

Check for timeout of active clients 

Send any queued requests if possible, balancing.. 
..load for requests from multiple search.. 
10 . .processes 

If there is a message from a client 

If message is "replace me" replace the.. 

. .client with a new process 
If message is "done" update client.. 
15 . . information 

If message is "status" return status 
If message is "get" then 

If all clients are busy or a request.. 
..has been made to this site.. 
20 ..within the last x seconds then.. 

. .queue the request 
Otherwise send request to a client 
Endif 

Endif 

25 End loop 

The client processes simply retrieve the 
relevant pages, handling errors and timeouts, and return 
the pages directly to the appropriate search process. 



-17- 



The algorithm used for image meta search in the 
meta search engine of this invention is as follows: 
Process the request to check syntax and create.. 

..regular expressions which are used to match.. 
5 . . query terms 

Send requests (modified appropriately) to all.. 

..relevant image search engines 
Loop for each page retrieved until maximum number of.. 
..images or all pages retrieved 
10 If page is from a search engine 

Parse search engine response extracting. . 
..hits and any link for the next set.. 
. .of results 
Send requests for all of the hits 
15 Send request for the next set of results.. 

. . if applicable 
Else if page is an image 

Add image to the display queue 

Else 

20 Analyze query term locations in the page.. 

..and predict which (if any) of the.. 
..images on the page corresponds to.. 
..the query - send a request to.. 
..download this image 

25 Endif 

If n images are in the display queue 

Create a single image montage of the.. 

. . images in the queue 
Display the montage as a clickable image.. 
30 ..where each portion of the image.. 
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..corresponding to the original.. 
..individual images shows a detail.. 
. .page for the original image 

5 Endif 
End loop 

If any images are in the display queue 

Create a single image montage of the images in.. 
. . the queue 

10 Display the montage as a clickable image where.. 

..each portion of the image corresponding.. 
. . to the original individual images shows . . 
..a detail page for the original image 

Endif 

15 Print summary statistics 

Figure 11 shows a simplified control flow 
diagram 60 for the image meta search algorithm. 

Image Classification 

20 The Web image search engine WebSeer attempts to 

classify images as photographs or graphics. WebSeer 
extracts a number of features from the images and uses 
decision trees for classification. We have implemented a 
similar image classification system. However, we use a 

25 different feature set and use a neural network for 

classification. Figures 12 and 13 show the response of 
the meta search engine of this invention to the image 
query koala, with the images filtered for photos. Figure 
14 shows the response when filtering for graphics. 

30 
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Document Clustering 

Document clustering methods typically produce 
non-overlapping clusters. For example, Hierarchical 
Agglomerative Clustering (HAC) algorithms, which are the 
most commonly used algorithms for document clustering 

(Willet, P. (1988) , "Recent trends in hierarchical 
document clustering: a critical review', Information 
Processing and Management 24, 577-597), start with each 
document in a cluster and iteratively merge clusters 
until a halting criterion is met. HAC algorithms employ 

similarity functions (between documents and between sets 

of documents) . 

A document clustering algorithm is disclosed 
herein which is based on the identification of 
co-occurring phrases and conjunctions of phrases. The 
algorithm is fundamentally different to commonly used 
methods in that the clusters may be overlapping, and are 
intended to identify common items or themes. 

The World Wide Web (the Web) is large, contains 
a lot of redundancy, and a relatively low signal to noise 
ratio. These factors make finding information on the Web 
difficult. The clustering algorithm presented here is 
designed as an aid to information discovery, i.e. out of 
the many hits returned for a given query, what topics are 
covered? This allows a user to refine their query in 
order to investigate one of the subtopics. 

The clustering algorithm is as follows: 
Retrieve pages corresponding to the query 
For each page 

For n = 1 to MaximumPhraseLength 
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For each set of successive n words 

If this combination of words has not 
..already appeared in this.. 
. .document then add the set to a 
5 ..hash table for this document.. 

. .and a hash table for all. . 
. .documents 

End for 
End for 
10 End for 

For n = MaximumPhraseLength to 1 

Find the most common phrases of length n, to a. . 
. .maximum of MaxN phrases , which occurred. . 
. .more than MinN times 
15 Add these phrases to the set of clusters 

End for 

Find the most common combinations of two clusters.. 

..from the previous step, to a maximum of MaxC. . 

..combinations, for which the combination.. 
20 ..occurred in individual documents at least.. 

. .MinC times 

Delete clusters which are identified by phrases.. 

..which are subsets of a phrase identifying.. 
. .another cluster 
25 Merge clusters which contain identical documents 

Display each cluster along with context from a set . . 
. .of pages for both the query terms, . 
..and the cluster terms. 
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Figure 15 shows the clusters 70 produced by 
this algorithm for the query "joydeep ghosh" , and figure 
16 shows the first two cluster summaries 72 and 74 for 
these clusters. Figures 17 and 18 show the clusters 76 
5 and 80 produced by HuskySearch for the same query. 

Figure 19 shows the clusters 82 produced by AltaVista. 
Figures 2 0 and 21 show the clusters 84 and 86 produced by 
the meta search engine of this invention for another two 
queries: "neural network" and typing and injury. 

10 



Query Expansion 

One method of performing query expansion is to 
augment the query with morphological variants of query 

15 terms. Word stemming (Porter, M.F. (1980), u an algorithm 

for suffix stripping' , Program 14, 130-137.) can be used 
in order to treat morphological variants of a word as 
identical words. Web search engines typically do not 
perform word stemming, despite the fact that it would 

20 reduce the resources required to index the Web. One 

reason for the lack of word stemming by Web search 
engines is that stemming can reduce precision. Stemming 
considers all morphological variants. Query expansion 
using all morphological variants often results in reduced 

25 precision for Web search because the morphological 

variants often refer to a different concept. Reduced 
precision using word stemming is typically more 
problematic on the Web as compared to traditional 
information retrieval test collections, because the Web 

30 database is larger and more diverse. A query expansion 
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algorithm is disclosed herein which is based on the use 
of only a subset of morphological variants. 
Specifically, the algorithm uses the subset of 
morphological variants which occur on a certain 

5 percentage of the Web pages matching the original query. 

Currently, the query terms are stemmed with the Porter 
stemmer (Porter, M.F. (1980), "An algorithm for suffix 
stripping", Program 14, 13 0-137.) and the retrieved pages 
can be searched for morphological variants of the query 

10 terms. Variants which occur on greater than 1% of the 

pages are displayed to the user for possible inclusion in 
a subsequent query. No quant itive evaluation of this 
technique has been performed, however observation 
indicates that useful terms are suggested. As an 

15 example, for the query nec and "digital watermark", the 

following terms are suggested for query expansion: 
digitally, watermarking, watermarks, watermarked. 

Currently the technique does not automatically 
expand a query when first entered, because the query 

20 expansion terms are not known until the query is 

complete. However the technique can be made automatic by 
maintaining a database of expansion terms for each query 
term. The first query containing a term can add the co- 
occurring morphological variants to the database, and 

25 subsequent queries can use these terms, and update the 

database if required. 

Specific Expressive Forms 

Accurate information retrieval is difficult due 
30 to the possibility of information represented in many 
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ways - requiring an optimal retrieval system to 
incorporate semantics and understand natural language. 
Research in information retrieval often considers 
techniques aimed at improving recall, e.g. word stemming 
and query expansion. As mentioned earlier, it is 
possible for these techniques to decrease precision, 
especially in a database as diverse as the Web. 

The World Wide Web contains a lot of 
redundancy. Information is often contained multiple 
times and expressed in different forms across the Web. 
In the limit where all information is expressed in all 
possible ways, high precision information retrieval would 
be simple and would not require semantic knowledge - one 
would only need to search for one particular way of 
expressing the information. While such a goal will never 
be reached for all information, experiments indicate that 
the Web is already sufficient for an approach based on 
this idea to be effective for certain retrieval tasks. 

The method of this invention is to transform 
queries in the form of a question, into specific forms 
for expressing the answer. For example, the query "What 
does NASDAQ stand for?" is transformed into the query 
"NASDAQ stands for" "NASDAQ is an abbreviation" "NASDAQ 
means" . Clearly the information may be contained in a 
different form to these three possibilities, however if 
the information does exist in one of these forms, then 
there is a high likelihood that finding these phrases 
will provide the answer to the query. The technique thus 
trades recall for precision. The meta search engine of 
this invention currently uses the specific expressive 
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forms (SEF) technique for the following queries (square 
brackets indicate alternatives and parentheses indicate 
optional terms or alternatives) : 

• What [is | are] x? 

• What [causes | creates | produces] x? 

• What do you think [about | of | regarding] x? 

• What does x [stand for | mean]? 

• Where is x? 

• Who is x? 

• [Why | how] [is | are] (a|the) x y? 

• Why do x? 

• When is x? 

• When do x? 

• How [do | can] i x? 

• How (can) [a | the] x y? 

• How does [a | the] x y? 

As an example of the transformations, "What 
does x [stand for|mean]?" is converted to "x stands for" 
"x is an abbreviation" "x means", and "What 
[causes | creates | produces] x?" is transformed to "x is 
caused" "x is created" "causes x" "produces x" "makes x" 
"creates x" 

Different search engines use different stop 
words and relevance measures, and this tends to result in 
some engines returning many pages not containing the 
SEFs. The offending phrases are therefore filtered out 
from the queries for the relevant engines. 

Figure 22 shows at 90 the response of the meta 
search engine of this invention for the query "What does 
NASDAQ stand for?" The answer to the query is contained 
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in the local context displayed for about 5 out of the 
first 6 pages. Figure 23 shows at 92 the response of 
Infoseek to the same query. The answer to the query is 
not displayed in the page summaries, and which, if any, 
5 of the pages contains the answer is not clear. Figures 

24 and 25 show, at 94 and 96, the meta search engine of 
this invention and Infoseek responses to the query "How 
is a rainbow created?" Again, the answer is contained in 
the local context shown by the meta search engine of this 

10 invention but it is not clear which, if any, of the pages 

listed by Infoseek contain the answer to the question. 
Figure 26 shows at 100 a third example of the response 
from the meta search engine of the invention for the 
query "What is a mealy machine?" 

15 It is reasonable to expect that the amount of 

easily accessible information will increase over time, 
and therefore that the viability of the specific 
expressive forms technique will improve over time. An 
extension of the above -discussed procedures is to define 

20 an order over the various SEFs, e.g. "x stands for" may 

be more likely to find the answer to "What does x stand 
for" than the phrase "x means" . If none of the SEFs are 
found then the engine could fall back to a standard 
query. 

25 Search tips may be provided by the meta engine. 

These tips may include, for example, the following: 

• Use quotes for phrases, e.g. "nec research". 

• You can hide the various options above to 
save screen space by clicking on the "hide" links. 
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• Window option: clicking on a hit brings up 
the page in the same window for multiple searches or a 
new window for each new search. 

• Filter option: filters pages when viewed to 
5 highlight query terms* Faster due to local caching of 

the page. 

• The letter (s) after the page titles identify 
the search engine which provided the result (e.g. 
A==AltaVista) . 

10 • The second field after the page titles is the 

time since the page was last updated (e.g. 5m=5 months, 
ly=l year) . 

• The third field after the page titles is the 
size of the page. 

15 • The context option selects the number of 

characters to display either side of the query terms. 

• The timeout option is the maximum time to 
download each individual page. 

• Searching in "Press" is useful for higher 
20 precision with current news topics. 

• Image option: remove images from the pages 
when viewed (for faster viewing) . 

• When viewing a filtered page, clicking on a 
query term jumps to the next occurrence of that term. 

25 Clicking on the last occurrence of a term jumps back to 

the first occurrence. 

• You can use u -term" to exclude a term. 

• You can search for links to a specific page, 
e.g. link: www. neci .nj .nec.com/homepages/giles. Self 

30 links are excluded. 
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• When in doubt use lower case. 

• This meta engine makes more than three times 
as many documents available as a single search engine. 
Constraining your search can help, e.g. if you want to 

5 know what NASDAQ stands for, searching for "NASDAQ stands 

for" rather than "NASDAQ" can find your answer faster 
although the information may also be expressed in 
alternative ways. 

• Clicking on the search engine links in the 
10 "Searching for:" line will show the search engine 

response to the current query. 

• You can search for images by selecting the 
"images" button, e.g. "red rose". 

• The bar to the left of the titles is longer 
15 when the query terms are closer together in the document. 

• The query term links in the "Searching for" 
line lead to the Webster dictionary definitions. 

• If you select Tracking: Yes, then your query 
will be tracked and new hits will be displayed on your 

20 customized home page similar to the "recent articles 

about NEC Research" . 

• Select Cluster: Yes to cluster the documents 
and identify common themes. 

• You can filter images using a neural network 
25 prediction of whether each image is a photo or a graphic 

using the Images: option. 

• A listing of pages ranked by term proximity 
is shown after all of the documents have been retrieved. 

30 Tracking Queries and URLs 
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Services such as the Informant (The Informant, 
1997) track the response of Web search engines to 
queries, and inform users when new documents are found. 
The meta search engine of this invention supports this 
5 function. Tracking is initiated for a query by selecting 

the Track option when performing the query. A daemon 
then repeats the query periodically, storing new 
documents along with the time they were found. New 
documents are presented to the user on the home page of 
10 the search engine, as shown at 102 in figure 27, The 

engine does not currently inform users if the documents 
matching queries have changed, although this could be 
added . 

The meta search engine of this invention also 
15 supports tracking URLs. Tracking is initiated by 

clicking the [Track page] link when viewing one of the 
pages from the search engine results. Alternatively, 
tracking may be initiated for an arbitrary URL using the 
options page. A daemon identifies updates to the pages 
20 being tracked, and shows a list of modified pages to the 

user on the home page, as in figure 27. The [Page] link 
displays the page being tracked and inserts a header at 
the top showing which lines have been added or modified 
since the last time the user viewed the page (e.g. see 
25 figure 28) . 

Estimating the Coverage of Search Engines 
and the Size of the Web 

As the World Wide Web continues to expand, it 
30 is becoming an increasingly important resource for 
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scientists. Immediate access to all scientific 
literature has long been a dream of scientists, and the 
Web search engines have made a large and growing body of 
scientific literature and other information resources 
5 easily accessible. The major Web search engines are 

commonly believed to index a large proportion of the Web. 
Important questions which impact the choice of search 
methodology include: What fraction of the Web do the 
search engines index? Which search engine is the most 
10 comprehensive? How up to date are the search engine 

databases? 

A number of search engine comparisons are 
available. Typically, these involve running a set of 
queries on a number of search engines and reporting the 

15 number of results returned by each engine. Results of 

these comparisons are of limited value because search 
engines can return documents which do not contain the 
query terms. This may be due to (a) the information 
retrieval technology used by the engine, e.g. Excite uses 

20 "concept -based clustering" and Infoseek uses morphology - 

these engines can return documents with related words, 

(b) documents may no longer exist - an engine which never 
deletes invalid documents would be at an advantage, and 

(c) documents may still exist but may have changed and no 
25 longer contain the query terms . 

Selberg and Etzioni (Selberg, E. and Etzioni, 
O. (1995) , Multi-service search and comparison using the 
MetaCrawler, in 'Proceedings of the 1995 World Wide Web 
Conference" . ) have presented results based on the usage 
30 logs of the MetaCrawler meta search service (due to 
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substantial changes in the search engine services and the 
Web, it is expected that their results would be 
significantly different if repeated now) . These results 
considered the following engines: Lycos, WebCrawler, 
5 InfoSeek, Galaxy, Open Text, and Yahoo . Selberg and 

Etzioni's results are informative but limited for several 
reasons . 

First, they present the "market share" of each 
engine which is the percentage of documents that users 

10 follow that originated from each of the search engines. 

These results are limited for a number of reasons, 
including (a) relevance is difficult to determine without 
viewing the pages, and (b) presentation order affects 
user relevance judgements (Eisenberg, M. and Barry, C. 

15 (1986), Order effects: A preliminary study of the 

possible influence of presentation order on user 
judgements of document relevance, in "Proceedings of the 
49th Annual Meeting of the American Society for 
Information Science", Vol. 23, pp. 80-86). 

20 The results considered by Selberg and Etzioni 

are also limited because they present results on the 
percentage of unique references returned and the coverage 
of each engine. Their results suggest that each engine 
covers only a fraction of the Web, however their results 

25 are limited because (a) as above, engines can return 

documents which do not contain the query terms - engines 
which return documents with related words or invalid 
documents can result in significantly different results, 
and (b) search engines return documents in different 

30 orders, meaning that all documents need to be retrieved 

-31- 



for a valid comparison, e.g. two search engines may index 
exactly the same set of documents yet return a different 
set as the first x. 

In addition, Selberg and Etzioni find that the 
5 percentage of invalid links was 15%. They do not break 

this down by search engine. Selberg and Etzioni do point 
out limitations in their study (which is just a small 
part of a larger paper on the very successful MetaCrawler 
service) . 

10 AltaVista and Infoseek have recently confirmed 

that they do not provide comprehensive coverage on the 
Web (Brake, D. (1997), "Lost in cyberspace", New 
Scientist 154(2088), 12-13.) Discussed below are 
estimates on how much they do cover. 

15 We have produced statistics on the coverage of 

the major Web search engines, the size of the Web, and 
the recency of the search engine databases . Only the 6 
current major full -text search engines are considered 
herein (in alphabetical order) : AltaVista, Excite, 

20 HotBot, Infoseek, Lycos, and Northern Light. A common 

perception is that these engines index roughly the same 
documents, and that they index a relatively large portion 
of the Web. 

We first compare the number of documents 

25 returned when using different combinations of 1 to 6 

search engines. Our overall methodology is to retrieve 
the list of matching documents from all engines and then 
retrieve all of the documents for analysis. Two 
important constraints were used. 
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The first constraint was that the entire list 
of documents matching the query must have been retrieved 
for all of the search engines in order for a query to be 
included in the study. This constraint is important 
5 because the order in which the engines rank documents 

varies between engines. Consider a query which resulted 
in greater than 1,000 documents from each engine. If we 
only compared the first 200 documents from each engine we 
may find many unique URLs . However, we would not be able 

10 to determine if the engines were indexing unique URLs, or 

if they were indexing the same URLs but returning a 
different subset as the first 200 documents. 

The second constraint was that for all of the 
documents that each engine lists as matching the query, 

15 we attempted to download the full text of the 

corresponding URL. Only documents which could be 
downloaded and which actually contain the query terms are 
counted. This is important because (a) some engines can 
return documents which they believe are relevant but do 

20 not contain the query terms (e.g. Excite uses "concept- 

based clustering" and may consider related words, and 
Inf oseek uses morphology) , and (b) each search engine 
contains a number of invalid links, and the percentage of 
invalid links varies between the search engines (engines 

25 which do not delete invalid links would be at an 

advantage) . 

Other details important to the analysis are: 
1. Duplicates are removed when considering the 
total number of documents returned by one engine or by a 
30 combination of engines, including detection of identical 
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pages with different URLs. URLs are normalized by 
a) removing any M index . html " suffix or trailing V" / b) 
removing a port 80 designation (the default) , c) removing 
the first segment of the domain name for URLs with a 
5 directory depth greater than l(in order to account for 

machine aliases) , and d) unescaping any "escaped" 
characters (e.g. %7E in a URL is equivalent to the tilde 
character) . 

2. We consider only lowercase queries because 
10 different engines treat capitalized queries differently 

(e.g. AltaVista returns only capitalized results for 
capitalized queries) . 

3. We used an individual page timeout of 60 
seconds. Pages which timed out were not included in the 

15 analysis. 

4. We use a fixed maximum of 700 documents per 
query (from all engines combined after the removal of 
duplicates) - queries returning more documents were not 
included. The search engines typically impose a maximum 

20 number of documents which can be retrieved (current 

limits are AltaVista 200, Infoseek 500, HotBot 1,000, 
Excite 1,000, Lycos 1,000, and Northern Light > 10,000) 
and we checked to ensure that these limits were not 
exceeded (using this constraint no query returned more 

25 than the maximum from each engine, notably no query 

returned more than 200 documents from AltaVista) . 

5. We only counted documents which contained 
the exact query terms, i.e. the word "crystals" in a 
document would not match a query term of "crystal" - the 

30 non-plural form of the word would have to exist in the 
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document in order for the document to be counted as 
matching the query. This is necessary because different 
engines use different morphology rules. 

6. HotBot and AltaVista can identify alternate 
5 pages with the same information on the Web. These 

alternate pages are included in the statistics (as they 
are for the engines which do not identify alternate pages 
with the same data) . 

7. The "special collection" (premier documents 
10 not part of the publicly indexable Web) of Northern Light 

was not used. 

Over a period of time, we have collected 50 0 
queries which satisfy the constraints. For the results 
presented herein, we performed the 500 queries during the 

15 period 8/23/97 to 8/24/97. We manually checked that all 

results were retrieved and parsed correctly from each 
engine before and after the tests because the engines 
periodically change their formats for listing documents 
and/or requesting the next page of documents (we also use 

20 automatic methods designed to detect temporary failures 

and changes in the search engine response formats) . 

Figure 2 9 shows the fraction of the total 
number of documents from the 6 engines which were 
retrieved by each individual engine . Table 1 below shows 

25 these results along with the 95% confidence interval. 

HotBot is the most comprehensive in this comparison. 
These results are specific to the particular queries 
performed and the state of the engine databases at the 
time they were performed. Also, the results may be 

30 partly due to different indexing rather than different 
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databases sizes - different engines may not index 
identical words for the same documents (for example, the 
engines typically impose a maximum file size and 
effectively truncate oversized documents) . 

5 



TABLE 1 



Search 
Engine 


HotBot 


Excite 


Northern 
Light 


AltaVista 


Inf oseek 


Lycos 


Coverage 
WRT 6 
Engines 


39.2% 


31.1% 


30.4% 


29.2% 


17.9% 


12.2% 


95% 

confidence 
interval 


+ .4% 


+/-1.2% 


+/-1.3% 


.2% 


+/-!.!% 


+/-!.!% 



Figure 30 shows the average fraction of 
documents retrieved by 1 to 6 search engines normalized 

20 by the number retrieved from all six engines. For 1 to 5 

engines, the average is over all combinations of the 
engines, which is averaged for each query and then 
averaged over queries. Using the assumption that the 
coverage increases logarithmically with the number of 

25 search engines, and that, in the limit, an infinite 

number of search engines would cover the entire Web, f (x) 
= b( 1 - l/exp(ax) ), where a and b are constants and x 
is the number of search engines, was fit to the data 
(using Levenberg-Marquardt minimization (Fletcher, R. 

30 (1987) , Practical Methods of Optimization, Second 

Edition, John Wiley & Sons) with the default parameters 
in the program gnuplot) and plotted on figure 30. This 
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is equivalent to the assumption that each engine covers a 
certain fixed percentage of the Web, and each engine's 
sample of the Web is drawn independently from all Web 
pages (Ci = + c^l-c^), i = 2...n where Cj. is the 
5 coverage of i engines and c x is the coverage of one 

engine) . 

There are a number of important biases which 
should be considered. Search engines typically do not 
consider indexing documents which are hidden behind 

10 search forms, and documents where the engines are 

excluded by the robots exclusion standard, or by 
authentication requirements. Therefore, we expect the 
true size of the Web to be much larger than estimated 
here. However search engines are unlikely to start 

15 indexing these documents , and it is therefore of interest 

to estimate the size of the Web that they do consider 
indexing (hereafter referred to as the w indexable Web"), 
and the relative comprehensiveness of the engines. 

The logarithmic extrapolation above is not 

20 accurate for determining the size of the indexable Web 

because (a) the amount of the Web indexed by each engine 
varies significantly between the engines, and (b) the 
search engines do not sample the Web independently. All 
of the 6 search engines offer a registration function for 

25 users to register their pages. It is reasonable to 

assume that many users will register their pages at 
several of the engines. Therefore the pages indexed by 
each engine will be partially dependent. A second source 
of dependence between the sampling performed by each 

30 engine comes from the fact that search engines are 
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typically biased towards indexing pages which are linked 
to in other pages, i.e. more popular pages. 

Consider the overlap between engines a and b 
in figure 31. Assuming that each engine samples the 

5 

Web independently, the quantity n Q /n b , where n Q is the 
number of documents returned by both engines and n b is 
the number of documents returned by engine b, is an 
estimate of the fraction of the indexable Web, p a , 

10 covered by engine a. Using the coverage of 6 engines as 

a reference point we can write p' a = n a /n 6r where 
n a is the number of documents returned by engine a and n 6 
is the number of unique documents returned by the 
combination of 6 engines. Thus, p' a is the coverage of 

15 engine a with respect to the coverage of the 6 

engines, we can write c = p' a /p a = n a ii b /n 6 n 0 . We use this 
equation to estimate the size of the Web in relation to 
the amount of the Web covered by the 6 engines considered 
here. Because the size of the engines varies 

20 significantly, we consider estimating the value of c 

using combinations of two engines, from the smallest two 
to the largest two. We limit this analysis to the 24 5 
queries returning > 50 documents (to avoid difficulty 
when n 0 =0) . Table 2 shows the results. Values of c 

25 smaller than 1 suggest that the size of the indexable Web 

is smaller than the number of documents retrieved from 
all 6 engines. It is reasonable to expect that larger 
engines will have lower dependence because a) they can 
index more pages other than the pages which users 

30 register, and b) they can index more of the less popular 
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pages on the Web, Indeed, there is a clear trend where 
the estimated value of c increases with the larger 
engines . 



5 



10 TABLE 2 



Search 
Engines 


Lycos 

Sc 

Inf oseek 


Infoseek & 
AltaVista 


AltaVista & 

Northern 

Light 


Northern 
Light & 
Excite 


Excite 
& HotBot 


Engine 
Sizes 


Smallest 




-> 




Largest 


Estimated 
c 


0.6 


0.9 


0.9 


1.9 


2.2 


95% 

confidence 
interval 


+/-0.04 


+/-0.06 


+/-0.04 


+/-0 . 12 


+/-0 . 17 



Using c = 2.2, from the comparison with the 
largest two engines, we can estimate the fraction of the 

25 indexable Web which the engines cover: HotBot 17.8%, 

Excite 14.1%, Northern Light 13.8%, AltaVista 13.3%, 
Infoseek 8.1%, Lycos 5.5%. These results are shown at 
120 in figure 32. The percentage of the indexable Web 
indexed by the major search engines is much lower than is 

30 commonly believed. We note that (a) it is reasonable to 

expect that the true value of c is actually larger than 
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2.2 due to the dependence which remains between the two 
largest engines, and (b) different results may be found 
for queries from a different class of users. 

HotBot reportedly contains 54 million pages, 
5 putting our estimate on a lower bound for the size of the 

indexable Web at approximately 300 million pages. 
Currently available estimates of the size of the Web vary 
significantly. The Internet Archive uses an estimate of 
80 million pages (excluding images, sounds, etc.) 
JO (Cunningham, M. (1997), 'Brewster's millions', 

http : //www. irish-times . com/ irish- 

times/paper/1997/0127/cmpl .html . ) Forrester Research 
estimates that there are more than 75 million pages 
(Guglielmo, C. (1997), *Mr.Kurnit's neighborhood', Upside 

15 September.) AltaVista now estimates that there the Web 

contains 100 to 150 million pages (Brake, D. (1997) , 
x Lost in cyberspace', New Scientist 154(2088), 12-13), 

A simple analysis of page retrieval times leads 
to some interesting conclusions. Table 3 below shows the 

20 median time for each of the six major search engines to 

respond, along with the median time for the first of the 
six engines to respond when queries are made 
simultaneously to all engines (as happens in the meta 
engine) . 

25 Table 3 



Search Engine 


Median Time for response 
(seconds) 


AltaVista 


0.9 


Inf oseek 


1.3 
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HotBot 


2.6 


Excite 


5.2 


Lycos 


2.8 


Northern Light 


7.5 


All engines 


2.7 


First of 6 engines 


0.8 


First result from the meta 
search engine of this 
invention 


1.3 



10 Histograms of the response times for these 

engines and the first of 6 engines are shown in figures 
33 and 34, and the median times are shown in figure 35. 
Figure 36 shows the median time for the first of n 
engines to respond. These results are from September 

15 1997, and we note that the relative speed of the search 

engines varies over time. 

Looking now at the time to download arbitrary- 
Web pages, figure 37 shows a histogram of the response 
time. Figure 38 shows the median time for the first of n 

20 engines to respond. We can estimate the time for the 

meta engine to display the first result, which we create 
by sampling from the distributions for the first of 6 
search engines (the meta engine actually uses more than 6 
search engines but we concentrate on the major Web 

25 engines here) , and the first of 10 Web pages (the actual 

number depends on the number returned by the first engine 
to respond) , adding these together, and averaging over 
10,000 trials. 
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Figure 39 shows a histogram of this 
distribution. The median of the distribution is 1.3 
seconds (compared to 2.7 seconds for the median response 
time of a search engine even without downloading any 
5 actual pages) . For comparison, the average time 

MetaCrawler takes to return results is 25.7 seconds 
(without page verification) or 139.3 seconds (with page 
verification) (Selberg, E. and Etziono, 0. (1995) , Multi- 
service search and comparison using the MetaCrawler, in 

10 'Proceedings of the 1995 World Wide Web Conference') (the 

underlying search engines and/or the Web appear to be 
significantly faster than they were when Selberg and 
Etzioni performed their experiment) . 

Therefore, on average we find that the parallel 

15 architecture of the meta engine of this invention allows 

it to find, download and analyze the first page faster 
than the standard search engines can produce a result 
although the standard engines do not download and analyze 
the pages. Note that the results in this section are 

20 specific to the particular queries performed (speed as a 

function of the query is different for each engine) and 
the network conditions under which they were performed. 
These factors may bias the results towards certain 
engines. The non-stationarity of Web access times is not 

25 considered here, e.g. the speed of the engines varies 

significantly over time (short term variations may be due 
to network or machine problems and user load, long term 
variations may be due to modifications in the search 
engine software, the search engine hardware resources, or 

30 relevant network connections) . 
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The meta search engine of this invention 
demonstrates that real-time analysis of documents 
returned from Web search engines is feasible. In fact, 
calling the Web search engines and downloading Web pages 
5 in parallel allows the meta search engine of this 

invention to, on average, display the first result 
quicker than using a standard search engine. 

User feedback indicates that the display of 
real-time local context around query terms, and the 
10 highlighting of query terms in the documents when viewed, 

significantly improves the efficiency of searching the 
Web. 

Our experiments indicate that an upper bound on 
the coverage of the major search engines varies from 6% 

15 (Lycos) to 18% (HotBot) of the indexable Web. Combining 

the results of six engines returns more than 3.5 times as 
many documents when compared to using only one engine. 
By analyzing the overlap between search engines, we 
estimate that an approximate lower bound on the size of 

20 the indexable Web is 300 million pages. The percentage 

of invalid links returned by the major engines varies 
from 3% to 7%. Our results provide an indication of the 
relative coverage of the major Web search engines, and 
confirm that, as indicated by Selberg and Etzioni, the 

25 coverage of any one search engine is significantly 

limited. 

While it is apparent that the invention herein 
disclosed is well calculated to fulfill the objects 
previously stated, it will be appreciated that numerous 
30 modifications and embodiments may be devised by those 
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skilled in the art, and it is intended that the appended 
claims cover all such modifications and embodiments as 
fall within the true spirit and scope of the present 
invention. 
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What is claimed is: 

1. A computer- implemented meta search engine method, 
comprising the steps of : 

forwarding a query to a plurality of third 
5 party search engines ; 

parsing the responses from the third party 
search engines in order to extract information regarding 
the documents matching the query; 

downloading the full text of the documents 
10 matching the query; 

locating query terms in the documents and 
extracting text surrounding the query terms; and 

displaying the text surrounding the query 

terms . 

15 

2. A method according to Claim 1, further including the 
step of progressively displaying the text surrounding the 
query terms as the documents are retrieved. 

20 3. A method according to Claim 1, further including the 

step of filtering the context strings in order to improve 
readability by removing redundant whitespace, repeated 
characters, HTML comments and tags, and special 
characters . 



25 



4. A method according to Claim 1, further including the 
step of identifying and filtering pages which no longer 
contain the query terms . 
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5. A method according to Claim 1, further including the 
step of clustering the documents based on analysis of the 
full text of each document and identification of co- 
occurring phrases and words, and conjunctions thereof. 

6 . A method according to Claim 1 , further including the 
steps of storing the documents matching a query so that a 
query can be repeated and only showing documents which 
are new or have been modified since the last query or a 
given time. 

7. A method according to Claim 1, further including the 
step of filtering the actual documents when viewed in 
full in order to (a) highlight the query terms, and (b) 
insert quick jump links so the user can quickly jump to 
the query term of interest . 

8. A method according to Claim 1, further including the 
steps of creating and using a database of meta- 
information regarding query terms, e.g. storing a list of 
movie titles, recognizing when the user enters a query 
containing a movie title, and taking a special action 
such as referring the user to the review of the movie at 
a specific movie review site. 

9. A method according to Claim 1, further including the 
step of storing and using information regarding the 
particular documents requested by a user in response to a 
query, e.g. remembering the most commonly requested 
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document for a given query and presenting this document 
first in response to the same query in the future. 



10. A method according to Claim 1, further including the 
5 steps of analyzing the number of documents which would 

have been found as a function of the number of third 
party search engines queried, and computing the estimated 
size of the third party search engines and the estimated 
size of the document base which the third party search 
10 engines index. 

11. A method according to Claim 1, further including the 
step of scheduling regular searches, whereby the user is 
informed of either new or modified documents since the 

15 previous search. 

12. A method according to Claim 1, further including the 
step of using a more advanced detection of duplicate 
documents by identifying duplicate context even when 

20 documents may have different headers or footers. 

13. A method according to Claim 1, further including the 
step of caching the full documents in order to improve 
access speed. 

25 

14. A method according to Claim 1, further including the 
step of using context sensitive suggestions based on the 
query entered, e.g. providing suggestions regarding how 
to search for a name when the query contains a single 

30 character that could represent an initial . 
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15. A method according to Claim 1, further including the 
step of using a proximity based ranking scheme to re-rank 
documents according to the number of and proximity 
between query terms. 

16. A computer- implemented meta search engine method, 

comprising the steps of: 

forwarding a query to a third party search 

engine; 

parsing the responses from the third party 
search engine in order to extract information regarding 
the documents matching the query; 

downloading the full text of the documents 

matching the query; 

locating query terms in the documents and 
extracting text surrounding the query terms; and 

displaying the text surrounding the query 

terms . 

17. A method according to Claim 16, further including 
the step of progressively displaying the text surrounding 
the query terms as the documents are retrieved. 

18. A method according to Claim 16, further including 
the step of filtering the context strings in order to 
improve readability by removing redundant whitespace, 
repeated characters, HTML comments and tags, and special 
characters . 
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10 



15 



19. A method according to Claim 16, further including 
the step of identifying and filtering pages which no 
longer contain the query terms. 

20. A method according to Claim 16, further including 
the step of clustering the documents based on analysis of 
the full text of each document and identification of co- 
occurring phrases and words, and conjunctions thereof. 

21. A method according to Claim 16, further including 
the steps of storing the documents matching a query so 
that a query can be repeated and only showing documents 
which are new or have been modified since the last query 
or a given time. 



22. A method according to Claim 16, further including 
the step of filtering the actual documents when viewed in 
full in order to (a) highlight the query terms, and (b) 
insert quick jump links so the user can quickly jump to 

20 the query term of interest. 

23. A method according to Claim 16, further including 
the steps of creating and using a database of meta- 
information regarding query terms, e.g. storing a list of 

25 movie titles, recognizing when the user enters a query 

containing a movie title, and taking a special action 
such as referring the user to the review of the movie at 
a specific movie review site. 
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24. A method according to Claim 16, further including 
the step of storing and using information regarding the 
particular documents requested by a user in response to a 
query, e.g. remembering the most commonly requested 
document for a given query and presenting this document 
first in response to the same query in the future. 

25. A method according to Claim 16, further including 
the step of scheduling regular searches, whereby the user 
is informed of either new or modified documents since the 
previous search. 

26. A method according to Claim 16, further including 
the step of using a more advanced detection of duplicate 
documents by identifying duplicate context even when 
documents may have different headers or footers. 



27. A method according to Claim 16, further including 
the step of caching the full documents in order to 

20 improve access speed. 

28. A method according to Claim 16, further including 
the step of using context sensitive suggestions based on 
the query entered, e.g. providing suggestions regarding 

25 how to search for a name when the query contains a single 

character that could represent an initial. 

29. A method according to Claim 16, further including 
the step of using a proximity based ranking scheme to re- 
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rank documents according to the number of and proximity 
between query terms. 



15 



30. A computer- implemented keyword based image search 
5 engine method, comprising the steps of: 

forwarding a query to a plurality of third 
party image search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
10 the images matching the query; 

downloading the images matching the query; and 
displaying thumbnails of the images to the 

user. 

31. A method according to claim 30, further including 
the step of user selectable filtering of the images based 
on size, color, or semantic attributes of the images. 

32. A method according to claim 30, further including 
20 the step of identifying and filtering commonly used 

images on the Web such as the Netscape Now image and 
horizontal bars used to separate sections of a document. 

33. A method according to claim 30, further including 
25 the step of identifying and filtering similar images. 

34. A method according to claim 30, further including 
the steps of identifying the type of an image, e.g. 
photograph, line drawing, logo, map, cartoon, portrait, 
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button, chart, or astronomical pictures, and filtering 
based on the image type. 



35. A method according to claim 30, further including 
the steps of storing the images matching a query so that 
a query can be repeated, and only showing new images. 

36. A method according to claim 30, further including 
the step of storing the meta- information (e.g. type of 
image) so that images may be filtered using the meta- 
information without downloading the image again for new 
queries . 

37. A method according to claim 30, further including 
the steps of displaying the full image along with the 
document referring to it if possible, and highlighting of 
query terms in the document. 

38. A computer- implemented keyword based image search 
engine method, comprising the steps of: 

forwarding the query to a plurality of third 
party text search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
the documents matching the query ; 

downloading the documents matching the query; 

analyzing the documents and locating images 
which may match the user query based on the proximity of 
query terms to image tags or references; 

downloading the images; and 
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displaying thumbnails of the images to the 

user. 



39. A method according to claim 38, further including 
the step of user selectable filtering of the images based 
on size, color, or semantic attributes of the images. 

40. A method according to claim 38, further including 
the step of identifying and filtering commonly used 
images on the Web such as the Netscape Now image and 
horizontal bars used to separate sections of a document. 

41. A method according to claim 38, further including 
the step of identifying and filtering similar images. 

42. A method according to claim 38, further including 
the steps of identifying the type of an image, e.g. 
photograph, line drawing, logo, map, cartoon, portrait, 
button, chart, or astronomical pictures, and filtering 
based on the image type. 

43. A method according to claim 38, further including 
the steps of storing the images matching a query so that 
a query can be repeated, and only showing new images. 

44. A method according to claim 38, further including 
the step of storing the meta- information (e.g. type of 
image) so that images may be filtered using the meta- 
information without downloading the image again for new 
queries . 
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45. A method according to claim 38, further including 
the steps of displaying the full image along with the 
document referring to it if possible, and highlighting of 
query terms in the document. 

46. A computer- implemented meta search engine 
comprising : 

means for forwarding a query to a plurality of 
third party search engines; 

means for parsing the responses from the third 
party search engines in order to extract information 
regarding the documents matching the query; 

means for downloading the full text of the 
documents matching the query; 

means for locating query terms in the documents 
and extracting text surrounding the query terms; and 

means for displaying the text surrounding the 
query terms. 

47. A meta search engine according to Claim 46, further 
including means for the progressive display of the text 
surrounding the query terms as the documents are 
retrieved. 

48. A meta search engine according to Claim 46, further 
including means for the filtering of the context strings 
in order to improve readability by removing redundant 
whitespace, repeated characters, HTML comments and tags, 
and special characters. 
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49. A met a search engine according to Claim 46, further 
including means for the identification and filtering of 
pages which no longer contain the query terms. 

50. A met a search engine according to Claim 46, further 
including a mechanism for clustering the documents based 
on analysis of the full text of each document and 
identification of co-occurring phrases and words, and 
conjunctions thereof. 

51. A met a search engine according to Claim 46, further 
including a mechanism for storing the documents matching 
a query so that a query can be repeated and for only 
showing documents which are new or have been modified 
since the last query or a given date. 

52. A computer- implemented meta search engine 

comprising: 

means for forwarding a query to a 
third party search engine; 

means for parsing the responses from the third 
party search engine in order to extract information 
regarding the documents matching the query; 

means for downloading the full text of the 
documents matching the query; 

means for locating query terms in the documents 
and extracting text surrounding the query terms; and 

means for displaying the text surrounding the 
query terms . 
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53. A meta search engine according to Claim 52, further 
including means for the progressive display of the text 
surrounding the query terms as the documents are 
retrieved. 

54. A meta search engine according to Claim 52, further 
including means for the filtering of the context strings 
in order to improve readability by removing redundant 
whitespace, repeated characters, HTML comments and tags, 
and special characters. 

55. A meta search engine according to Claim 52, further 
including means for the identification and filtering of 
pages which no longer contain the query terms. 

56. A meta search engine according to Claim 52, further 
including a mechanism for clustering the documents based 
on analysis of the full text of each document and 
identification of co-occurring phrases and words, and 
conjunctions thereof. 

57. A meta search engine according to Claim 52, further 
including a mechanism for storing the documents matching 
a query so that a query can be repeated and for only 
showing documents which are new or have been modified 
since the last query or a given date. 

58. A computer- implemented keyword based image search 
engine system, comprising: 
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means for forwarding a query to a number of 
third party image search engines; 

means for parsing the responses from the third 
party search engines in order to extract information 
regarding the images matching the query; 

means for downloading the images matching the 

query; and 

means for displaying thumbnails of the images 
to the user. 



10 



15 



59. A system according to claim 58, further including 
means for selectable filtering of the images based on 
size, color, or semantic attributes of the images. 

60. A system according to claim 58, further including 
means for identifying and filtering commonly used images 
on the Web such as the Netscape Now image and horizontal 
bars used to separate sections of a document. 

61. A system according to claim 58, further including 
means for identifying and filtering similar images • 

62. A system according to claim 58, further including 
25 means for identifying the type of an image, e.g. 

photograph, line drawing, logo, map, cartoon, portrait, 
button, chart, or astronomical pictures, and filtering 
based on the image type . 



20 
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63. A system according to claim 58, further including 
means for storing the images matching a query so that a 
query can be repeated, and only new images are shown. 

64. A system according to claim 58, further including 
means for storing the meta- information (e.g. type of 
image) so that images may be filtered using the meta- 
information without downloading the image again for new 
queries . 

65. A system according to claim 58, further including 
means for displaying the full image along with the 
document referring to it if possible, and means for 
highlighting of query terms in the document. 

66. A computer- implemented keyword based image search 

engine, comprising: 

means for forwarding the query to a plurality 
of third party text search engines; 

means for parsing the responses from the third 
party search engines in order to extract information 
regarding the documents matching the query; 

means for downloading the documents matching 

the query; 

means for analyzing the documents and locating 
images which may match the user query based on the 
proximity of query terms to image tags or references; 
means for downloading the images; and 
means for displaying thumbnails of the images 
to the user. 
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67. A system according to claim 66, further including 
means for selectable filtering of the images based on 
size, color, or semantic attributes of the images. 



5 68. A system according to claim 66, further including 

means for identifying and filtering commonly used images 
on the Web such as the Netscape Now image and horizontal 
bars used to separate sections of a document. 

10 69. A system according to claim 66, further including 

means for identifying and filtering similar images. 

70. A system according to claim 66, further including 
means for identifying the type of an image, e.g. 

15 photograph, line drawing, logo, map, cartoon, portrait, 

button, chart, or astronomical pictures, and filtering 
based on the image type. 

71. A system according to claim 66, further including 
20 means for storing the images matching a query so that a 

query can be repeated, and only new images are shown. 

72. A system according to claim 66, further including 
means for storing the meta- information (e.g. type of 

25 image) so that images may be filtered using the meta- 

information without downloading the image again for new 
queries . 

73. A system according to claim 66, further including 
30 means for displaying the full image along with the 
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document referring to it if possible, and means for 
highlighting of query terms in the document. 

74. A computer- implemented method for estimating the 
relative coverage of third-party search engines which 
comprises the steps of: 

forwarding a set of queries to two third-party 

search engines; 

retrieving the full list of results from each 

search engine; 

retrieving the text of all pages listed by the 

search engines; 

filtering out pages which are unavailable or no 

longer match the query; 
and 

comparing the number of remaining pages from 
each engine. 

75. A computer- implemented method for information 
retrieval which comprises the steps of: 

recognizing a query in the form of a question; 
transforming the question into a set of one or 

more specific forms in 

which the answer to the question might be 

expressed; and 

searching for the transformed query. 

76. The method according to claim 75, wherein the 
specific expressive forms for each type of question are 
manually written. 
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77. The method according to claim 75, wherein the 
specific expressive forms for each type of question are 
learnt by analyzing the context of query terms in the 
documents which users select from the search method 
5 comprising the steps of: 

forwarding a query to a plurality of third 
party search engines; 

parsing the responses from the third party 
search engines in order to extract information regarding 
10 the documents matching the query; 

downloading the full text of the documents 
matching the query; 

locating query terms in the documents and 
extracting text surrounding the query terms; 
15 displaying the text surrounding the query 

terms; and 

identifying common forms of the context. 

20 78. A computer- implemented method for query expansion 

which comprises the steps of: 

stemming the query terms; 

searching the set of query result pages for 
commonly occurring morphological variants of the query 
25 terms ; and 

using the commonly occurring morphological 
variants for query expansion. 

30 
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ABSTRACT OF THE DISCLOSURE 

A computer implemented meta search engine and 
search method. In accordance with this method, a query 
is forwarded to one or more third party search engines, 
and the responses from the third party search engine or 
engines are parsed in order to extract information 
regarding the documents matching the query. The full 
text of the documents matching the query are downloaded, 
and the query terms in the documents are located. The 
text surrounding the query terms are extracted, and that 
text is displayed. 



-62- 



1 



Find: 



— /4 



Locality: (Any Age limit: | None jrj Depth: [Any gj Images^ 
_J Hits: poo jgj Context: | 100 j§J Cluster |no jr] Tracking: |n° iiJ 



CHide);; 



Tip: Not using the options above? Click on the "hide" links to save screen space. 



Tomorrow at NECI 



CS Talk: " Visual Homing: Surfing on the Epipoles'\ Dan Shimshoni > The Technion, (DWJ) - Multipurpose 
Room 2F01, 2nd Floor r - ' - 



10:30 



This is joint work with Ronen Basri and Ehud Rivlin We introduce a novel method for visual homing. Using this method a robot 
can be sent to desired positions and orientations in 3-D space specified by single images taken from these positions. Our method is 
Vi based on recovering the epipolar geometry relating the current image taken by the robot and the target image. Using me epipolar 
geometry, most of the parameters which specify the differences in position and orientation of the camera between the two images : 
are recovered. However, since not all of the parameters can be recovered from two images, we have developed sp.„ ^ ' ^ 4 



i. -, ft ^Lexical Semantics and Information Retrieval Discussion Group Meeting, Coordinator, Robert Krovetz - Board 
'Room 4D09, 4th Floor - ^miif if ": ■ '^M^i^?^'^ - ■ '■''■'^W^M^ 1 . - : . '■■ ' 



[OUT jAeppli, Altshuler, AnupindhEbbesen, Gottlieb, Omohundro, de Ruyter 



Coverage WRT Estimated "Indexable Web" Size 




H Estimated Total 
■HotBot 
Excite 
;* t Northern Light 

■ AltaVista 
Slnfoseek 

■ Lvcos 



Figure 1 . Home page of the NECI meta search engine. 



Options: ' 




Individual page timeout: 

Filter pages to highlight hits when viewing full page: 
Filter images from the pages when viewed: 
Display pages for each query in the same or a new window: (same gj 
Image classification: 
Add URL to track: 



[Keep ' 



I Yes »| 



Queries being tracked: 



4r 



"search engine" Press [Stop tracking quervl ^ 
signafy Press [Stop tracking query] 



URLs being tracked: 

"2 Page http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.htm] [Stop tracking URL] 
m Page http://www.Iinuxhq.com/kpatch21.html [Stop tracking URL] 
H Page ftp:/mp.cygnus.com/pub/egcs/releases/ [Stop tracking URL1 
^ Page http://www.neci.nj.nec.com/homepages/giles/ fStop tracking URL1 
Q Page ftp:)yftp.kernel.org/pub/linux/kernel/v2.1/ fStop tracking URL1 

Description of the options on the main page: 
J3 Hits: Maximum number of hits to display excluding duplicates 

Context: Number of context characters to show either side of the query terms 
Cluster: Cluster documents after retrieval 

Tracking: Start tracking this query and tell me when new documents appear which match the query 

Locality: Only show documents in this domain 

Age limit: Filter out documents older than the specified age 

Depth: Only show documents with a given subdirectory depth 

Images: Filter images and only show photos or graphics 

Figure 2. Options page of the NECI meta search engine. 



%2£ 



Find: jnec and "digital watermark" ~ ' ; 



fflidel 




Searching for: +nec +"digita1 watermark " using: HotBot Infoseek AltaVista Excite Lycos Northern Light 

Yahoo WebCrawler, : ^ . 

7> Tip: The query term links in the "Searching for" line lead to the Webster dictionary definitions. 

■0 Ingemar Cox Home Page I lm I3k http:^VAvv/.neci.nj\necxom; / hornepages/ingernar/i ngemar.html 
...htrp://wwwmeci.nj\necxorr^omepages/ingemar/ingemar.html... A..ar Cox Home Page Ingemar J. Cox Sr. Research 
Scientist, Computer Vision , NEC Research Institute My most recent work has focused on the development of 
statistical framewor... A..r investigation include face recognition, and stereo correspondence problems. Address NEC 
Research Institute 4 Independence Way Princeton, NJ 08540. USA Office: 609-95 1 -2722 Fax... /... My most recent 
work has focused on the development of statistical frameworks for motion analysis, digital watermarking and 
^ content-based image database retrieval. Oilier projects currently under investigation ineiud... 

1 A^CI Technical Report 95-10 I lv 2k http://www.neci. nj.nec.com/tr/neci-abfitrart-Qfi-lO html 
^ .. JV£a Technical Report 95-10 .http://www.neci.nj.necx^ NECI Technical 

Report 95-10 NECI Technical Report 95-10 NEC Research Institute, 4 Indepe... /... Ingemar L Cox, Joe Kilian, Tom 
ja Leighlon. and Talal Shamoon. December 4, 1995. We describe a digital watermarking method for use in audio, 
,j image, video and multimedia data. We argue that a watermark must be p.;. /...n, including dithering and recompression 
n and rotation, translation, cropping and scaling. The same digital watermarking algorithm can be applied to all three 

media under consideration with only minor modifications, ... 

y-M Mass High Tech I n/a 4k http:/^oston.com/mht/issue/w81296/index.hJml 

'I ... sites Netscape Bonds With Apple. Net scan Netscape Bonds With Apple JEM adds NEC to its online factory outlet 
- Local science teachers Access Excellence A digitized play... /... Lead Stories August 12- 1 7, 1996 This Week In Mass 
; High Tech ARIS says it's on key with digital watermark is right on tune Info highway rest stops CAT's meow of Web 
□ sites Netscape Bond... 

y CZ3 BU CAS CS 585: Image and Video Computing — Syllabus I n/a 6k 

http://mw/.cs.bu.edu/faculry/sciarcff/courses/cs585/sy!labus.html 

... Cox, J. Kilian, T. Leighlon, and T. Shamoon. Secure Spread Spectrum Watermarking for Multimedia , NEC 
Research Institute Technical Report 95-10. M. Kass, A. Witkin, and D. Terzopoulos, Snakes: Aeti... /...ions [1,2] Oct 
T 1 Edge detection C5 PI due, P2 out R 3 Digital watermark, steganography [3] T 8 Edges, contours C6 R 10 
Curve matchin... 



■ SMH COMPUTERS February 20 1996 : Mark to foil Net pirates H n/a 2k 

hUp:/Avvvv/.smh.com.au/computers/ / content/960220/news6-960220.html 

...6 : Mark to foil Net pirates Week of February 20. 1996 Mark to foil Net pirates NEC researchers in the US have 
developed a "digital watermark" that can be attached to multimedia info... /...ing its owner beyond doubt in the case of 
a copyright dispute. Embedded in the data itself, NEC says it is ?, a mathematically derived code included in the 
frequency signals of die information sen... /...re multimedia information of dubious ownership is proliferating. The code 
is invisible to users and NEC is confident it cannot be found and stripped out by multimedia pirates. It is embedded in 
... /... February 20, 1996 Murk to foil Net pirates NEC researchers in the US have developed a "digital watermark" 
that can be attached to multimedia information, identifying its owner beyond doubt in the case of... 



[...section deleted...] 



Figure 3. First portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark". 



Ranked pages (first 20): 



p|299.974 M NEWSbytes A ly 36k http://nccr.monitor.ca/monitor/issues/vol3iss7/newsbytes.html 
^|i" l0l0!a 1icx ' nsCS A W lc 0S A PP ie CE0 reveals now strategy Chess: Kasparov beats Deep Blue NEC 'Digital 
^gWatermark' technology Week of February 1 2 - February J 6 / i 996 New Fax stand... /...-around won't come cheap 
0ENIAC to run on 50th anniversary Mitel demos USB phone-computer connection Previous Page / Next Page 
f£|Week of March 4 - March 8 /... /...ely, and productively. The products offered by the Small Business Unit will 
^reportedly let users connect to the Internet as well as create corporate intranets to link the businesses with their 
^customers... /...a licenses Apple OS Apple CEO reveals new strategy Chess: Kasparov beats Deep Blue NEC 
^ Digital Watermark' technology Week of February 12 - February 1 6 / 1 996 New Fax standard to incorporate ... 
: g/...he third and fourth were draws, and Kasparov won the last two games. Back to top NEC 'Digital Watermark' 
-technology [February 20/96] NEC Corporation lias developed technology that will digitally mark... A..eo, and 
^multimedia data as well as text and images. Unlike conventional encryption systems, the digital watermark stays 
fe;;embedded in the data and remains unaffected by digital-analogue conversions, imago scaling o... 

?>|299.931 1 tigermarktwo.html-A 7m 4k http://intermedia-design.com/tigermarktvvo.htmf 
■ ^/L. tigermarktwo.html NEC TigerMark DataBlade Module for Images From Informix and NEC What is 
: ]^Watermarking? ... /... NEC TigerMark DataBlade Module for Images From Informix and NEC What is 
^-Watermarking? With the advent of digital communication, including the I... /...ni? the Internet, make it easv to 
^transmit and redistribute perfect copies of digital data. Now with NEC is TigerMark technology, you can custom 
| 2 watermark your linages permanently and securely, without de... /...es permanently and securely, without de<*radin<> 
S:.j lhe 1 uallt y of th « content. NEC has developed a digital Watermarking technology that solves this problem for " 
L ^today is content providers. NEC is TigerMark is a digit... /...erever your content goes, your watermark goes, too 
f|NEC provides a powerful tool Digital watermark NEC has developed a digital Watermarking tcchnoloev that 
^kmeets the needs of today is busines... A..s. too. NEC provides a powerful tool Digital watermark NEC has"" 
.-^developed a digital Watermarking technology that meets the needs of today is business environment NEC is 
E&Ssdigita! watermark Tige... 

299.89 M Focus on Internet H 1m 9k http://www.esi.es/lnformation/SWT/R_D/l196/four html 
^...ERNET NEWS 24 HOURS IN CYBERSPACE PARTS OF INTERNET GO BLACK IN PROTEST OVER 
^ NEW LAW NEC DEVELOPS DIGITAL WATERMARK TECHNOLOGY INTERESTING SITES KFKI 
^RESEARCH INSTITUTE FOR MEASUR... /...arian groups and individuals. Sunday February i 1 This is an 
|r ? -excerpt Source: Reuters NEC DEVELOPS DIGITAL WATERMARK TECHNOLOGY PRINCETON. N.J. - 
^il NEC savs scientists at its NEC Research ... /...an excerpt Source: Reuters NEC DEVELOPS DIGITAL 
^WATERMARK TECHNOLOGY PRINCETON, N.J. - NEC says scientists at its NEC Research Institute have 
^developed a digital watermarking method for use... /...WS 24 HOURS IN CYBERSPACE PARTS OF 
^INTERNET GO BLACK IN PROTEST OVER NEW LAW NEC DEVELOPS DIGITAL WATERMARK 
^TECHNOLOGY INTERESTING SITES KFKI RESEARCH INSTITUTE FOR MEASUREMENT AND 

-COMPUTING TECHNIQ... /...and individuals. Sunday February 1 i This is an excerpt Source* Reuters NEC 
DEVELOPS DIGITAL WATERMARK TECHNOLOGY PRINCETON, N.J. - NEC says scientists at its NEC 
Research Institute have developed a ... A..K TECHNOLOGY PRINCETON. N.J. - NEC savs scientists at its NEC 
^Research Institute have developed a digital watermarking method for use in protecting copyrighted audio, ima^e 
% : 5 V!< ^° and multimedia data. The company s... 

[...section deleted...] 

Figure 4. Second portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing the pages ranked according to the relevance measure (equation 1) which includes term prox- 
imity information. 



Only 1 search term was found in these documents: 



ID ARIS Technologies' Homepage H 16d 2k http^wvvw.musicodexom/welcome.html 

|... ARIS Technologies' Homepage ARIS Technologies is an industry leader in digital watermarking. We deal 
^exclusively with protecting intellectual property such as audio, video, and multimedia... 

|£3 Psych 267 Final Projects A n/a-8k http.VAvh ite.stanfcrd.edu/-heeger/psych267/finai.html 
a..nteraclive lighting design. Proceedings Eurographics '95, p. 229-240, 1995 ( preprint ). Digital watermark. 
|References: Cox, Kilian, Leighton and Shamoon, "A Secure. Imperceptible yet Perceptually sal... /..., IBM Tech. 
pReport ( preprint available ). Further links to other papers and resources on digital watermarks. Face recognition 
gwith "eigenfaces". References; Turk and Penliand. "Face recognition u... 

itD Digimarc receives funding of S4.5M. N n/a 6k http://wvw/.nlsearch.com/'cgi- 
|bin/pdserv.pl?cbrecid=YY1 99704250301 63059&ho=typhoon&po=5005 

Summary: First licensee is Adobe. Digimarc, the company that last year announced its Imagemarc digital 
[^watermark technology, seems to be ready to make its move in the market. ... 

|P3 Newsbvtes Daily Summary N ** Od *" 28k http://new3bytes.mpx.com.au/newsbytes/daily,himl 
g.... Lemout & Hauspic [NASDAQrLHSPFJ {L&H) of Burlington, Massachusetts, and leper, Belgium. CHIPS NEC 
^Develops World's Smallest Transistor TOKYO, JAPAN, 1997 SEP 1 1 (NB* - By Marty n Williams. NEC Co... /...PS 
|NF:C Develops World's Smallest Transistor TOKYO. JAPAN, 1997 SEP 1 1 (NB) - By Martyn Williams. NEC 
|Corporation [TOKYO:6701] says it has developed the world's smallest operational transistor, a me... /...te length of 
|14 nanometers ( 14 millionths of a millimeter). The achievement was reached as part of NEC's development of a 10 
^terabit memory chip. Intel Advances Mobile PC Platform HONG KONG. CHINA.... 

ffZI A letter from the publisher TIME, December 6, 1971 L ly 3k 
|http://e!ectron.rutgers.edu/-myadavAvar71Ava!i/dec6a.html 

: tough warning to India. Bui the only evidence of war that night was the blackout which was quite unnecessary." 
:?From the correspondents' files, and from background research assembled by Reporter-Res... 

Figure 5. Third portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing pages where only one of the query terms was found. 
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No search terms were found in these documents: 



^ Article Two H ly lk http://miavx1. muohio.edu/- whittijs/articletwo. html 

$>Userdir rule failure The server was unable to resolve the requested / username reference, possible causes include: 
HUsername invalid Server is unable to determine username's login directory due to insufficient privilege to 

^■Jonathan G.Campbell University of Ulster, N. Ireland, WWW Links I n/a lk http://www.iscm.uist.ac.Lik/-jon/book/ 
frJ.G. Campbell^ Bookmarks From 27 August 1997 this page is *permanently* relocated to 
g http://www.infm.ulst.ac.uk/ jgc/book Updated 27 August 1997 - JG.Campbell@uIst.ac.uk 

I jCIOS/Comserve WWW server address has changed L 5m Ik 
^http://ciosJfe.rpi.edu:4997/maiiboxes/comgrads\08085104.118 

i^CIOS/Comserve WWW server ad_dress has changed The CIOS web server address has changed. It is now 
g|http://www.cios.org Please note too the new email address for the Comserve email interface to 

^ -^DEFINE IMAGE L 9m 2k http://iram.fr/doc/sic/node58.htmi 

/J iDEFINE IMAGE Next: DEFINE /LIKE Up: DEFINE Previous: DEFINE HEADER DEFINE IMAGE DEFINE 
:J_ 1JMAGE Varl Filel Key 1 [Var2 File2 Key2 [...]] [/GLOBAL] 

^ . Arizona Off-Road L 3m 3k http://www.azoffroad.com 

''H g Arizona Off-Road 1 833 W. Mountain View Road Phoenix, A2 85021 ATC's MOTORCYCLES JET SKIS GO 
IM ICARTS 

^Resultats dans les cantons L 21d 5k http://www.adrriin.ch/crvf/pore/va/1Q840226/can316.html 
p|Votation no 3 16 - Resultats dans les cantons Tableau recapitulatif / deutsch Votation no 3 16 Resultats dans les 
I! ^cantons Arrete federal concernant la perception d'une redevance sur le trafics des poids lourds du 24 juin 19 

^ i | M We Know How the Parisians Felt" L ly 6k http://eiectron.rutgers.edu/-myadav/war71/waH/dec27b.htrnl 

4 fp" We Know How ^ e Parisians Felt" "We Know How the Parisians Felt" Section: Box ,Page, TIME, Dec. 27, 1971 

■f4 gjTime Correspondent Dan Coggin, who covered the war from Pakistani side, was in Dacca when that city 

ffl ||surrendered. His repor 

§ |The U.S. : A Policy in Shambles L ly 6k http://eiectron.rutgers.edu/-myadavAvar71/waH/dec20b.htmi 
gfThe U.S. : A Policy in Shambles The Nixon Administration drew a fusillade of criticism last week for its policy on 
^pndia and Pakistan. Two weeks ago; when war broke out between two traditional enemies, a State Department 
^spokesman issued 

ClariNet Tearsheet: Government, ^Business, and General News N •** Od ** 8k httpVAvvAV.clari.ne^Sarnples/nb-other.htrnl 
^ClariNet Tearsheet: Government, Business, and General News ClariNet * ClariNet Tearsheet: Government, 
^JBusiness, and General News ClariNet Tearsheet: Government, Business, and General News This summary of 
^"computer and technology news is 

Figure 6. Fourth portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing pages where none of the query terms were found. 



. Pages with duplicate context strings to a page above: 

i| MVo1 - l>No. 19 a ly 18k http://vvwv.media.sbexpos.corn,'BULL/BUL0119.HTM 
^■i Alternate H ly 18k hitp://www.seyboldreport.com/BULb'BUL01 1 9. HTM 

|« ...ore than five million members Presstek floats additional stock Pearlsetters launched in Europe NEC announces 
^digital watermark Oracle to include software suite with Internet box Ca... A..-BT have launched Presstek's 
D$ carlscitcrs in seven European countries. Reuters reports that NEC claims to have developed a digital watermark 
^system thai could piolecl digital files... /...n members Presstek floats additional stock Pcarketters launched in 
yEurope NEC announces digital watermark Oracle to include software suite with Internet box Canon combines 
^dtvtsions within an 1... /...ters in seven European countries. Reuters reports that NEC claims to have developed a 
S 'digital watermark system that could protect digital files, such as still images, video and audio. from unauthT. 

'^f 3 (htt p://www.videodiscoverv.com/vdvweb/dvd/dvdfaq .txt1 H lm 1 18k 

Irhttpr/Vwww.videodiscovery.com/vdyvveb/dvd/dvdfaq.Sxt 

fc ! rpp?IS,S e 2L T, !5 c !!S iu,, - standaKl rcGMS/D) is not yet finalized, but will apply to digital connections such 
- ; -, ; as lfcbb 1394/Firewire. 3) Because of the potential for perfect didtal copies, paranoid... /. isplaving it No 
^unscrambled digital output is allowed until work in progress for secure digital connections is finished On the 
^.-.computer side, DVD-ROM drives and video display/decoder hardware or sofiw... /...d a PC VI audio ira-k (Other 
yearns such as Dolby Digital audio, MPEG audio, and subpicture are not necessary for the simplest case.) Basic 
-DVD control codes ore also needed. At the moment it's ditficul... /...doing this, but it's possible. The music industry 
|is also requesting an "embedding signalling" or "digital watermark" copy protection feature. This applies a diaitai 
^signature to the audio in the form of supposedly ... S 

Hvflex J 1 Launch H 3m 3k http:,7jpn.co.jp/jan96/jp14.htrnl 
igMB Hvflex J 1 Launch H 3m 3k http://jpn.co.jp/feb96/jp14.html 

%.. Hyflex Jl Launch NEC Develops Digital Watermarking Technique JPN Scientists at NEC Research Institut 

Nl r E | CVC,0 P S Di § ital Watermarking Technique JPN Scientists at NEC Research Institute in Princeton'" 
|,NJ have developed a digital watermarking method that could be us... /...ary information is increasing an issue " 
^said 1 atsuo Ishigoro, associate senior vice president of NEC Corp. "...I am convinced that our watermarking ' 
^• jechmque is a solution that will be welcomed cspec... /... Hyflex Jl Launch NEC Develops Digital Watermarking 
^Technique JPN Scientists at NEC Research Institute in Princeton, NJ, have de... /...ique JPN Scientists at NFC ° 
^Research Institute m Princeton, NJ, have developed a digital watermarking method that could be used to protect 

• C °f yn /f ol una S es and mus « In'^net. Con... /...e is no way to track its reproduction and therefore it 
^..provides little protection against piracy. A digital watermark, however, can protect a copyright by means of an 
^invisible identification code that is pertnancntiv... 

:■ Internet H n/a 20k http://net.info.nVuL'0296/inlernet.html 

|>.ucape servers. Dit kwam o.a. door net feit dat bepaalde optionele onderdelen zoals een database-connector duur 
|j;betaald moeten worden. Microsoft op zijn beurt deed daar vvcer een schepje bovenop door wc 

fefc tTw " UCr ° SC f • C0n ^ ltf0 ^ a '' '"'P^^nw^oftcom/windows http://ww.netscape.com NEC ontwikkelt 

^Digital Watermark Technology NEC is in zijn compmerlaboratormim bezig met een di«h 
.|y...microsot!.coni/win(k,w 5 http://wwu .netscape.com NEC ontwikkclt Digital Watermark Technology NEC is in 
jzijn computerlaboratoriums bezig met een digilaal watcrmcrk. Dit watcrmerk mod in de tockom.. / Tcom/infWrv 

/ww ^-'mcn)solt.com/window< http://www.netscape.com NEC ontwikkelt Digital Watermark Technology 
^inm, is m zijn computerlaboratoriums bezig met een digitaal watcrmerk. Dn watcrmerk ... 

Figure 7. Fifth portion of a sample response of the NECI meta search engine for the query nec and "digital 
watermark", showing pages which contained duplicate context strings to pages found earlier. 



These documents no longer exist: 



Error 404 Not found - file doesn't exist or is read protected {even tried multij Digital Image Watermarking: Main 
Project Page http:#wwwxsugIab.cornell.edu^ 

Error 404 Not found Labeling Techniques for Multimedia Data: http://www- 
it.et.tudelft.nl/pda/smash/pub!ic;'benelux„cr.html 

Error 404 Not found Labeling Techniques for Multimedia Data: http://www- 
it.et.tude!ftnl/pda/smash/publia'benlx96/benelux - cr.html 

Error 404 Not Found Artisoft Inc. -- Industry Awards and Recognition http://artisoft.com/mairvbverview/awards.htm! 
Error 404 File Not Found The Rut.eers Review http://electron.rutqers.edu/-nebus/ 

This search: +nec + "digital watermark" Search engine pages: AltaVista Page 2 Page 3 Excite Page 2 
HotBot Page_2 Inf oseek Lycos Northern Light Page_2 WebC rawler Yahoo * 

; Query expansion (adding these words to the query may help): digitally ( 1 6) digitized ( 1 6) digit (9) digitale (8) 
digitaal(8) digitization (5) digits (3) digitize (3) watermarking (463) watermarks (127) watermarked 
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More documents were found but the maximum number of hits was reached* 



2 terms: 70 1 term: 5 0 terms: 11 duplicate context: 9 invalid link: 5 

Figure 8. Sixth (and final) portion of a sample response of the NECI meta search engine for the query nec and 
"digital watermark", showing the summary information including the number of results from each individual 
engine, etc. 



Jump to: nec (2) digital watermark (2) http://wvm,neci»nj.necxom/tr/neci-abstract-95-10.html [Track 

■M ]: :i-}. : ■ ■' X- . \ \ ■-■:-i!!^'-f-: i:i;: - "S^ page! 5 ; e: ^ m -" e ' : " . ! s =: ■ ■ ■: 



□ NECI Technical Report 95-10 

□ NEC Research Institute, 4 Independence Way, Princeton, NJ 08540. 

Secure Spread Spectrum Watermarking for Multimedia 

Ingemar J. Cox, Joe Kilian, Tom Leighton, and Talal Shamoon. December 4, 1995. 

We describe a □ digital watermark ing method for use in audio, image, video and multimedia data. We argue that a 
watermark must be placed in perceptually significant components of a signal if it is to be robust to common signal 
distortions and malicious attack. However, it is well known that modification of these components can lead to 
perceptual degradation of the signal. To avoid this, we propose to insert a watermark into the spectral components of 
the data using techniques analogous to spread sprectrum communications, hiding a narrow band signal in a wideband 
channel that is the data. The watermark is difficult for an attacker to remove, even when several individuals conspire 
together with independently watermarked copies of the data. It is also robust to common signal and geometric 
distortions such as digital-to-analog and analog-to-digital conversion, resampling, and requantization, including 
dithering and recompression and rotation, translation, cropping and scaling. The same □ digital watermark ing 
algorithm can be applied to all three media under consideration with only minor modifications, making it especially 
appropriate for multimedia products. Retrieval of the watermark unambiguously identifies the owner, and the 
watermark can be constructed to make counterfeiting almost impossible. Experimental results are presented to support 
these claims. 

Figure 9. Sample page view for the NECI meta search engine. The query terms are highlighted and the links at the top 
jump directly to the first occurrence of the respective query terms. 




Figure 10. Simplified control flow of the meta search engine. Interactions with the page retrieval daemon are shown 
in gray. 




Figure 
gray. 



11. Simplified control flow for image meta search. 



Interactions with the page retrieval daemon are shown in 



Find: koala 



S^^HMllllllfiP 




Tracking: |no jgj 



Searching for: koala using: WebSccr Corel Lycos Yahoo HotBot AltaVista. 
Tip: The bar to the left of the titles is longer when the query terms are closer together in the document 
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Figure 12. First portion of a sample response of the NECI meta search engine for the query koala in the 
databases, filtered for photos. 



image 



This search: koala Search engine pages: AltaVista Images Corel Images HotBot Images Page 2 Page 3 Page 
4 Page5 Page 6 Lycos Images " Page 2 Page 3 Page 4 Page 5 WebSeer Yahoo Images * 



AltaVista Images : ' : ^y^I%< - ; \Q,;. : . /• y ^-*Q} • : •' ; ■ 0 ; ' ; : • 0 

Corel Image s : ' ■.Yes^:^^ /> ^7^' ;' ; ; : yvV; i 7 — ; ' .-■ 0 : ' ■ ^ 

HotBot Imaged - Yes 55irH;^12S; > , ' 1 '99- "> ; : 1T^ 
; Ly^ ;^ Yes . : 222'"|p|: 80 '' ; ; ':fr '" 8i5^-:- = 

Yahoo Images ^y!Ye^y^; : ' 4 ; -, ; ;; ; ;;^4 . = '4-^- • - 

■ff-i. -in^ ■}§ i : i9s /:;:i^3w^- ; 

A/<?r£ documents were found but the maximum number of hits was reached. 
Filtered due to size: 12 Filtered due to type: 21 

Figure 13. Second portion of a sample response of the NECI meta search engine for the query koala in the image 
databases, filtered for photos. 




Searching for: koala using: WebSeer Corel Lycos Yahoo HotBot AltaVista . 
Tip: You can search for links to a specific page, e.a/linkrwvNnv.neci.ni.hec.com/homeDaaes/aileg. Self links am pypIi iHph 







V ^ ~.;ir More images,were fotmd but;^ hu^b^df hjte wasre ; ; ■ ■ 

This search: koala Search engine pages: AltaVista Images Corel Images HotBot Images Lvcos Images Pa*e 2 
■Page 3 Page 4 Page 5 Page 6 Page 7 WebSeer Yahoo Images " 



use |Eot§]y ; ; Retrieved Processed Duplicates 



Alta Vista Images Yes 0 0 

Corel Images >4 Yes ^ff^7 - * 

HotBot Images Yes ;0 

:' ;; : LVcos Images Yes , : " ' 217 

r ' " WebSeer ' ' ^^es..xM' \Q$ 

% Yahoo Images '.' Yes 4 

: : Total 228 



120: 

131 



'■■■■M 

128 

o ; 

139 



0 
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;o 

:5: 
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Afore documents were found but the maximum number of hits was reached. 
Filtered due to size: 2 Filtered due to type: 61 

Figure 14. Sample response of the NECI meta search engine for the query koala in the image databases, filtered for 
graphics. 
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Figure 15. Clusters for the query "joydeep ghosh". 



Cluster summaries: 




. Docume nt:...hy clicking on . Journal Papers: Ismail Taha and Joydeep Ghosh, VX A Hybrid Intelligent Architecture and 
Ics Application to Water Reserve... A..d to Journal of Smart Engineering Systems . Ismail Taha and Joydeep Ghosh, 
"Symbolic Interpretation of Artificial Neural Networks/', submitted ... /... Austin, 1996. Conference Papers: Ismail ' 
Taha and Joydeep Ghosh, "Evaluation and Ordering of Rules Extracted from Feedforward Networks.... /...Also, ^ 
Tech. Rep. TR-97-0 1-106, The Computer and Vision Research Center, University of Texas, Austin, 1996. 
Conference Papers: Ismail Taha an... 

Document:..Voy</gg/7 Ghosh... /... Joydeep Ghosh Joydeep Ghosh Telephone: (512) 471-8980 Fax: (512) 471-5... 
A.. Joydeep Ghosh Joydeep Ghosh Telephone: (512) 471-8980 Fax: (512) 471-5532 E-mail: ghosh@pin... /...Fax: 
(5 12; 471 -5532 E-mail: ghosh ® pine.ece.utexas.edu Address: The University of Texas at Austin Department of 

Electrical & Computer Engineering,.. 

Document:... Yoan Shin and Joydeep Ghosh Department of Electrical and Computer... /...Yoan Shin and Joydeep 
Ghosh Department of Electrical and Computer Engineering The University of Texas... /...in and Joydeep Ghosh 
Depart meni of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 Abstract 
This paper introduces a nov... 
...more... 



Document :... Artificial Neural Networks Authors: Bryan W. Stiles and Joydeep Ghosh Department of Electrical and 
Computer Engineering The Unive... /...rsity of Texas at Austin Correspondence: Bryan Stiles c/o Joydeep Ghosh 
Department of Electrical and Computer Engineering The Unive... /...Phone: (5 12) 471-2358 Email: 
bstiles@pine.ece.utexas.edu Joydeep Ghosh Department of Electrical and Computer Engineering The Univ... /... A 
Habituation Based Mechanism for Encoding Temporal Information in Artificial Neural Networks Authors: Bryan W. 
Stiles and Joydeep Ghosh Department o... A..1: ghosh@pine.ece.utexas.edu Submit to: Applications and Science of 
Artificial Neural Networks Steven K. Rogers and Dennis W. Ruck at AeroSense '9... 

Document :... (eds.), IEEE Press. 1995. pp 135 - 144. Bryan W. Stiles and Joydeep Ghosh, "A Habituation Based 
Mechanism for Encoding Temporal Information in Arti... A..E Proc. Vol. . Orlando, April 1995, pp. Bryan W. Stiles 
and Joydeep Ghosh, "Habituation Based Neural Classifiers for Spatio-temporal Signals", Pro... /...Proc. ICASSP-95, 
Detroit. May J 995, pp. Bryan W. Stiles and Joydeep Ghosh, "Dynamic Neural Networks for the Classification of 
Oceanographic Data",... /...Ghosh, "A Habituation Based Mechanism for Encoding Temporal Information in Artificial 
Neural Networks", (invited paper ) Proc. SPIE Conf. on Applications and Science of Artif... A..tworks'\ (invited 
paper ) Proc, SPIE Conf. on Applications and Science of Artificial Neural Networks IV, SPIE Proc Vol Orlando 
April 1 995. pp. Bryan W. St... 

Document :...iuh.cdu Larry D. Juckel Robert E. Schapire Y. Freund Kagan Turner and Joydeep Ghosh Shimon 
Edelrnan Jonathan Baxter Anders Krogh and Jesper Vedcishy ... /...ftp from 

, Ttp://eris.wisciomAvei/jTHtnn.ac.il/pub/mam.ps.Z" Kagan Turner and Joydeep Ghosh, "Theoretical Foundations of 
Linear and Order Statistics Combiners for Ne... /...When Networks Disagree: Ensemble Methods for Neural 
Networks", Chapter 10, Artificial Neural Networks for Speech and Vision, editor R.J. Mammonc Chapman-Hall 
London 1995 M.... 




Jl^ artificial neurainelypcsf - . ' ? ^ ■: ^ ,§ '.^^ 



^...more... 



Figure 16. The first two cluster summaries for the query "joydeep ghosh". 




Husky Search 



Query: (joydeep ghosh) 

Documents: 102, Clusters: 14, Average Cluster Size: 11.21 documents 



Document 
Group 



Size Phrase and Sample Document Titles 



Cluster 1 



19 



Artificial (74%), Hybrid intelligent (32%), domain knowledge (26%) 
O RGANIZ I NG COMMI TTEE 
Untitled 
Abstract 
■ Abstract 
Untitled 



Cluster 2 



Click on to view the abstract and on to obtain a postscript copy-. (71%), postscript copy of the full paper 

(100%), paper is currently not available. (57%) 

Ka gan Tu rn er's Publ ica tions 

Cl assification 

Working, Papers 

Referred Archival Journal Publications (Full/Regular Papers ! 
Technical Reports 



Cluster 3 



15 



Kagan turner (100%), NEURAL CLASSIFIERS Kagan Turner and Joydeep ghosh (27%), pattern 
; classifiers (20%), turner (53%) 
; Untitled 
; Untitled 

Abstract 

Abstract 

Abstract 
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31 



of Electrical and Computer (88%) 

Untitled 

Untitled 

LANS Home Pace 
Jovdcep Ghosh 
! L A NS Horn e Page. 

e.ece (32%), RESEARCH & EDUCATIONAL RESOURCES/ORGANIZATIONS/ centers (10%), of Texas 

(10%),er(81%) 

U ntitled 

Untitled 

Batch 92 Chemical Engineering 
LANS Homepage 
Jovd ce p Gh osh 



Cluster 6 



Cluster 7 
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University of Texas at austin (50%), University of texas (67%), texas (100%) 

Untitled 

U ntitled 

LANS Home Pace 
Jovdcep Ghosh ~ 
Jo y de ep Gho sh 



ters. Journal (100%) 

LA^HpjrnePage 
Ref^eedArchjvd^ 

Reg; rei^ Airhivajju u. rna I JPu Page rs) 



Figure 17. First part of the clusters for the query "joydeep ghosh" from HuskySearch. 
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[papers (100%) 
■ KagajLTumcr's j^IicaUons 
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Kurt's Publications 
^ Working Papers 
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ons. Books (100%) 
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i LANS Home Page 
; Asso ciated Members 
No Title 

s, Austin (100%) 
Untitled 

I LANS Home Pag e 
LANS Home Page 
No Title 



Cluster 12 



11 



combining (100%), outputs (64%) 

Abstract 

Abstract 

Abstract 

Untitled 

Abstract 
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ted (100%) 

LANS Home Page 
: ■ Kagan Turner's Publicatio as 
; CI S Publicat io as Da t abas e 

Kurt's Publications 

i Refereed Archi v al Journal Publications | Kirll/Kf^ njar Pape rs) 
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LANS Home Page 
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Figure 18. Second part of the clusters for the query 11 joydeep ghosh" from HuskySearch. 




■ F^ 3 26% Ghosh, neural, artificial, networks, sonar, austin, classification, texas, signals 

F r: M 24% Orthod > orthodontics, orthodontic, orthop, orthodontists, mandibular, maxillary, nanda, 
craniofacial 
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Figure 19. Clusters for the query " joydeep ghosh" from AltaVista. 
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artificial neural networks 
networks neural networks 
ieee international conference 

university of texas c , 

pacific northwest laboratory _ o \ 

recurrent neural networks 
nnw in hep 

pacific northwest national 

university of California 

austrian research institute 

self organizing map 

northwest national laboratory 

artificial intelligence 

pattern recognition 

research group 

international conference 

fuzzy logic 

san diego 

genetic algorithms 

ieee transactions 
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technical report 

machine learning 

data 

nets 

ai 

software 



Figure 20. Clusters produced by the NECI meta search engine for the query "neural network". 
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carpal tunnel syndrome 
repetitive stress injuries 
software monitoring tools 
contain useful advice 

repetitive strain injuries Qt 

san jose state OR san jose state AND archive _ 

kinds of documents 

university of nebraska 

htm diana carroll 

products include split 

dan wallach 

keyboard alternatives 

interest finder OR browse groups 

human factors 

tifaq general 

injuries 

related 

archive 

resources 

contain useful advice AND resources OR contain useful advice AND kevbuard alternatives 
repetitive stress injuries AND keyboard alternatives 
repetitive stress injuries AND dan wallach 

% ' W' T£. *• * : - <;?: :/v ^ ; ' -w- • ' r :iy: - ; v-- • • : - • ..... ... ... ... . 

.4;-. „zy - . "' ^yy^i : . carpal tunnel syndrome .-sir X" . ..; • r-' :^-.' =*0: ' ■ 

Defliment;...FAQ - Typing Injury ... /... Typing Injury FAQ Home Page [TIFAQ] [General] [Keyboards) [Speech] [Mice] [Sof FAQ - 
Typing Injury ... /... Typuig Injury FAQ Home Page (TIFAQ) [General] [Keyboards] [Speech] [Mice] [Software] / In urv Aro'hi vT 
sources ot mtormatton tor people with typing injuries, repetitive stress injuries, carpal tunnel syndrome, etc.. The TIFAQ is'tareeted at 
computer users sulferingat the hands of their equipment. You... targeted at 

tonejK:... JXM ;s Ergonomics Page is a site by John Murray at University of Michigan that focuses on typing injuries, carpal tunnel 
syndrome and design concepts. Office Working Postal is a cororoe... /...era! links to various safety oriented scWcrs IhtTand 
newsgroups. You do the searching. Typing Injuries is everything you ever wanted to know about tvpit.« injuries bv Dan Willach v 
h , ; noet,) /... Pnneeton. Lots of publications and links, Lverything you wanted to know and more. Typing Injury Archive is a tynins 
mjury library by Dan Wailach at Princeton. H«re you wil! find a well cate... /...ons and links. EvervthL ™ waLd to^i * SL 
rypmg Injury Archive is a typing injury library by Dan Wallach at Princeton. Here you will finds well 'categorized li« of tvpinc injury 
.../... and hrgonomies Home Page links to several ergonomic sites that focus on safety issues, such as: carpal tunnel syndrome back ' 
injuries, atr quality, sick building syndrome, and lighting. Ltmer.ce UvermoreLab ... /...al technoloay and human iacWs ZSl 
Mostly m-house work but an interesting site. Carpal Tunnel Syndrome is a commercial site but does have lots of references l> CIS "* 
.Specttic emphasis is on keyboard... ' " ^ 

picjimsnt: at the keyboard. Site offers Time Out For Windows, an ergonomic exercise break program. Typing Injury FAQ • This is the 
home page for the Fyping Injury FAQ and Typing Injury Archive. ... /... an ergonomic exercLLtk program. Tvping InluryFAQ 
This is the home page for the Typing Injury FAQ and Typing Injury Archive. NEW! University of Minnesota Offic ■ Er^on'os 
/ ..ate. rum one gets then, and sortie guidelines for how one may help heal oneself from this devastating injury. Carpal Tunnel Syndrome 
t^SZ-l^T RM \'r<*V Tunnel Syndrome & R e P cm,, Seres! Conflated RepeSve 

Strain Injury . I hope on this page to provide a very brief introduction to RSI for the benefit of students who... /...em and some» U id*'tn.«s 
.or now one may help heat oneself from this devastating injury. Carpal Tunnel Syndrome & Repetitive Stress Computer Related 
Repetitive Mrain Injury : 1 hope on this page t... /...pause helps you avoid OOS ,' RSI with Miciopauses raid Exercise Breaks Patient's 
Guide to Carpal Tunnel Syndrome : The following documents attempt to explain what Carpal Tunnel Syndrome is. kov. it i, 
Uiiiiinoscu 

m£S£^^ f AQ: ln(0 ™f? a " '- Typtog tnjui7 FAQ: General Information General Information [TIFAQ] 

Sro /S; xul Pm l i U ? f AQ: General ,nformation - / - T yP m 8 FAQ: General Information General Information 
StSSL^^k ["TJ m !M RS '. WRl L ° W "' k Rdi '- J ^ Limb Disorders - yet another synonym for RSI CTS 
Carpal Tunnel Syndrome (see below) Hyperextension Marked bending at a joint Pronation Tttn.ine the palm down / PW „,.. wrist 

l^Tn'*™ tL ' Gj0,m T; ' a " d ;t ° ets worSC With repetitive 3Ctivity - Carpal Tunnel Syndromt the nerves that run through your wrist 

tin-.- yodi lingers get trapped oy the ml tamed mi;... 0 3 

...more... 

Figure 21. Clusters produced by the NECI meta search engine for the query typing and injury along with the first 

cluster summary. 



Searching for: "NASDAQ stands for" "NASDAQ is an abbreviation" " NASDAQ means" using: HotBot 
: Infoseek AltaVista Excite Lycos Northern Light Yahoo WcbCrawler. J 

Tip: For better precision with multiple terms you might like to use V to ensure that the results contain specific terms (e q 
■ +"lee giles" +optics). 

m. -viation for the New York Stuck Exchange AM EX is an abbreviation for the American Stock Exchange 

-cTV? 15 a " abbreviation for me National Association of Securities Dealers Automatic Quotation Exchange "Ton 
r* /c of the ... ^ *■ 

M: nformatbn on NASDAQ and the companies traded thereon. (Incidentally, does anyone know what NASDAQ 
stands for?) NYSE All about the N ew Y ork S lock E xchange. Data mongers loo 

™™« C * AS£ ^° L . ast ; RcvlSed: 25 0ct 1996 Fl0m; billmanr@aol.com Jeffwben@aol.com , cml@cs.umd.edu 
INAaDAQ is an abbreviation for the National Association of Securities Dealers Automated dictation system It is 
also commonl... v " -" 

Sfcn^ S . DAQ if" R f viscd: 25 0ct 1996 From: WJJmam-#aol.com Jeffwben@aol.com , lott@invest-faq.com 
NASDAQ is an abbreviation for the National Association of Securities Dealers Automated Quotation svstem ll i< 
also commonl... - ' 

M:. ~We for the operation and regulation of the NASDAQ stock market and overthecounter markets NASDAQ 
Stands for the National Association of Securities Dealers Automated Quotation Svstem. A nationwide computed 
m:. .-sue InoeK .s a value we.-hted index that monitors more than 2,000 stocks traded over-the-counter NASDAQ 
stands for National Association of Securities Dealers Automated Quotations. It has been available since 1 97 1 
m..J* at i incentive stock option under Section 422 of the Code, (k) "NASDAQ" means the National Association of 
Securities Dealers, Inc. Automated Quotation System... 



( . . .section deleted. 



This search: "NASDAQ stands for" "NASDAQ is an abbreviation " "NASDAQ n^n<" Search engine paces' 
AltaVista Excite HotBot Infoseek Lycos Northern Light WebCrawler Yahoo 

if ' ^ SP^^MIi ^srp^y i\>t|| ^Sne>pd ftoBspf^ Duplicates' 

k ; AltaVista ; : Yes ^--^ -f ? ? '" ) ;^'-f %^;> ; \ : 
: • Excite { Yes" • f • 24 • % 24 \i- i4 : ^': : ?; : i^ : -] 
: HotBot - ^^^ U k ■ vV; >C;'23:I;:£. ' r 4';i ;. 
Infoseek ^-Yes '£5.$?-l 5 -.f ; ;> :7o v ' :; -\ 

Lycos .:i ; ; : ^:;Yes> : % : 0 ; . V?0 -J, ^ : 0 ; : ' 
NprthejTL^ 0 0 : ^ r 0 ■ •; 0 

WebCrawler Yex : n 0 ^ ^ 0 
'• Yahoo '.Yes : 0 0 4..";> 0 \ . - 0 " ; ' ' 

Total 61 61 61 ■ y 6 

3 terms:0 2terms:0 I term: 26 0 terms: 9 duplicate context: 14 cannot access: 3 invalid link: 3 
Figure 22. NECI meta search engine response for the query What does NASDAQ stand for? 
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Infoseek found 23,064,238 pages containing at least one of these words: what does NASDAQ stand for? 



Related Topics 



snags 



JJltfl 



Search Results 1-10 



Hide Summaries I next 10 
Stock Research NASDAQ - are you a shrewd investor? FREE copy of the best 

Stock Research NASDAQ - are you a shrewd investor? FREE copy of the best Stock Research NASDAQ. Be ahead of the 
Market, Stock Research NASDAQ that always call it right. Our ... 
72% http://www.updatenews.com/keyAinvestresearch/ (Size5.3K) 

CvberLand HQ 

CyBerCorp designs and develops real-time decision support, execution and trading systems for NASDAQ stock market traders 

Be sure to check out CyBerTrader. 

63% http://www.cyber-corp.com/ (Size4.0K) 

InvestOuest© AMUSEMENT & RECREATION SERVICES 

AMUSEMENT & RECREATION SERVICES. [InvestQuest® Homel Company List) Industries! Company Search] 
ALLIANCE GAMING CORP (NASDAQ:ALLY) ALPHA HOSPITALITY CORP (NASDAQ:ALHY) AMERICAN ... 
63% http://www.investquest.cc<n/.htnil/79_industry.htm (Size 11. 9K) 



\ InvestOuest® INSURANCE CARRIERS 
INSURANCE CARRIERS. [InvestQuest® Homel Company Listl Industries! Company Search] CENTURY 
INDUSTRIES (NYSE:TW) ACCEL INTERNATIONAL CORP (NASDAQrACLE) ACCEPTANCE INSURANCE ... 

I 62% http://ww.mves^est.corn/.html/63_industry.him (Size 26.8K) 

| 

» InvestOuest® WHOLESALE TRADE-DURABLE GOODS 

j WHOLESALE TRADE-DURABLE GOODS. [InvestQuest® Home! Company List! Industries! Company Search] AAR 
| CORP (NYSE:AIR) ABATIX ENVIRONMENTAL CORP (NASDAQ:ABIX) ACE HARDWARE CORP ... 
j 62% http^/www.investquest.cc>ni/JitmI/50_industry.htm (Size 24 .2K) 

| Stocks bv Symbol- C 

I Stocks by Symbol - C. C (NYSE) Chrysler CA (TSE) Canadian Airlines CANOTC%3AMGIS (OTC) Magisoft Software Com 
j CAWS+ (NASDAQ) CAI Wireless Systems Inc. CBMI (NASDAQ) Creative ... 
I 62% htrp://stc<kcIub.com/stocks/symbol-c-index.himl (Size9.6K) 



^ -MiU Stoc ks by Com pany Name - A 

M£M A+ Communications (NASDAQ:ACOM) A. G. Edwards (NYSE: AGE) Abatix Environmental (NASDAQ:ABIX) access health 
?tiilP < NASDA Q :AC CS) acclaim (NASDAQ:AKLM) Ackerley Communications (NASDAQ:AK) ... 
t0^fS^ 62 ^ http7/stcckclub.conVstc<:ks^a^nc-a-)ndex.htrnJ (Size8„5K) 



InvestOuest® FABRICATED METAL PRODUCTS 
t^df FABRICATED METAL PRODUCTS. [InvestQuest® Home! Company List! Industries! Company Search] AAVID 
™g§ THERMAL TECHNOLOGIES INC (NASDAQ: AATT) ABC RAIL PRODUCTS CORP (NASDAQ:ABCR) ABS ... 
sSg 62% http:/AvWw.inves^uesLconri/.hm^/34_indiistry.hun (Size 13.3K) 



| InvestOuest® RUBBER AND MISC. PLASTICS PRODUCTS 

j RUBBER AND MISC. PLASTICS PRODUCTS. [InvestQuest® Homel Company List! Industries! Company Search] 
jj§ ADVANCED MATERIALS GROUP INC (NASDAQ:ADMG) AEP INDUSTRIES INC (NASDAQ: AEPI) ... 
3 62% http://ww.invesquestcoin/.htnu730_industry.htm (Size 10.6K) 

1 



m ENGINEERING & MANAGEMENT SERVICES. [InvestQuest® Home! Company List! Industriesl Company Search] 
f*i ADVANCED DETECTORS INC (OTC Bulletin Board:3ADET) AERO SYSTEMS ENGINEERING INC ... 
62% http://www.invcstquest.eom/.html/87_industry.htm (Size 15.8K) 

y%i Hide Summariw ( aexLlft 
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Figure 23. Response of Infoseek for the query What does NASDAQ stand for? 




Searching for: "rainbow js created " " makes a rainbow created " " rainbow is produced " " rainbow js made " 
using: HotBot Infoseek AltaVista Excjte Lycos Northern Light Yahoo WebCrawler. 

Tip: For better precision with multiple terms you might like to use V to ensure that the results contain specific terms (e.g. 

+"lee giles" +optics). 

Ref:... the n rce n flash, it helps to know how our atmosphere effects sunlight. Coincidental!}', the phenomenon 
responsible for the green flash is also the one that paints rainbows across Hawaii's sky. A rainbow is created when 
rays of sunlight enter a raindrop, bounce around inside, and exit. Light from the sun consists of a potpourri of colors 
that are each bent by a different amount inside a raindrop. This uncqua... 

Ref:...seapes the raindrop after it is reflected once. A parr of the ray is reflected again and travels along inside the drop 
to emerge from the drop. The rainbow we normally see is called the primary rainbow and is produced by one internal 
reflection the secondary rainbow arises from two internal reflections and the rays exit the drop at an angle of 50 
degrees rather than the 42degrees for the red primary bow. ... 

Ref:...e rainbow we do not see the sun, and we rarely see a rainbow in winter. How do we explain this appearance of a 
bow, double bows, size of arc, and brightness of the rainbow? Answer The rainbow is produced by sunlight passing 
through a raindrop or a collection of rain drops. A typical raindrop is spherical and as a light ray strikes the surface of 
the raindrop, some light is reflected and some passes ... 

Ref:...se to us. He promised that the earth will never be destroyed again by a flood. As a sign of that promise He put a 
rainbow in the sky. Whenever we see a rainbow, we can think of God's promise. The rainbow is made up of all the 
colors. Back To Index Next Page... Page 1 ... 

Ref:...two rainbows, the narrower male rainbow and the wider female. The male rainbow can not stop the rain by itself. 
When it is followed by the female the rain stops. Other Nauve Americans believe the rainbow is made from the souls 
of wild flowers that lived in the forest and lilies from the prairies. A Japanese mvth tells of the first man Isana^i and the 
first woman Isanami who stood on the floating bri... /...te of samsara before the clear light of Nirvana or heaven. In 
Arabia the rainbow is a tapestry draped by the hands of the south wind. It is also callecUhe cloud's bow or Allah's bow. 
In Islam the rainbow is made up of four colors red, yellow, green and blue related to the four elements. In myths of 
India the Goddess Indra not only carries a thunderbolt like the Greek God Zeus but she also carries a ... 
Ref:.... true b. false 13. The average speed of light is greatest in „. a. red glass, b. orange glass, c. green glass, d. blue 
glass, e. is the same in all of these. 14. The secondary rainbow is produced with an extra (choose the best answer) a 
dispersion, b. reflection, c. refraction, d. diffraction. 15. If a person has green cones that are weak, then vellow li<mt 
will appear to t... " " c 

Ref:...evcr wonder what makes the color in a rainbow ? The answer is sunlight, it has all of the colors of the rainbow in 
it, but they are all mixed up together so you are not a ble to see them. The rainbow is made up of drops of water 
When sunlight passes through a drop of water, it bends and the colors inside the light split apart and are separated so 
that we can see them. When the sunlight passes through... 



[ ...section deleted... ] 



Figure 24. NECI meta search engine response for the query How is a rainbow created? 
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Related Topics 



Infoseek found 20,594341 pages containing at least one of these words: how is a rainbow created? 



Search Results 1-10 



Hide Summaries I next 10 
Havden Books: Creative Techniques 

Creative Techniques brought to you by Hayden Books Working with Layers: Creating a Rainbow Effect Art by Gary Poyssick 
- jjs^ Comments: This tip shows you how to use layers and Photoshop's ... 

64 ^ http://www.mcpxom/16947W817CB12M (Size4.2K) 

Rainbow Sports Networks and The Sporting News Create Alliance 

World Data Web Connect InsideMedia August 16, 1996 Rainbow Sports Networks and The Sporting News Create Alliance 
Rainbow Programming Holdings' sports networks - NewSport, Prime and ... 
64% hup7/wwwjne<u*acentja].com7Magazin^ (Size3.9K) 



Havden Books: Creative Techniques 

Creative Techniques brought to you by Hayden Books Working with Layers: Creating a Rainbow Effect Art by Gary Poyssick 

Comments: This tip shows you how to use layers and Photoshop's ... 

64% htm://wwwjrcp.conVl8229751 149932^ (Size4.2K) 



iftli 
ISIS 



Rainbow Warriors? 

Rainbow Warriors? Hacker All the colors of the rainbow... The appropriate excerpt from the alt.2600 FAQ. You are left on your 
own recognizance 

63% hUp^Avwwjhc]oos.conVsjgames/hacker/chroma.html (Size 3.8K) 
Pet Loss and Rainbow Bridge 

Rainbow Bridge and Pet Loss grief pages, may post poems, photos, tributes or just stop by and be comforted. 
62% hnp://www.pn^netcom/-meggie/bridge.htra (Size46.2K) 

The sky's the limit: student activities 

We all see something different when we look up to the sky. The clouds often stir our imagination allowing us to see animated 
images being formed by those mysterious "puffs of ... 
62% http://www. solutions jbm.com/kl2/teacbeT/cIoudssJitmJ (Size9.6K) 



36^ J Asy metrix 3D F/Xtm Drag and Drop 3D for Windows screenshot Create 

¥3 high-quality, professionally rendered three-dimensional images and animations with Asymetrix 3D F/X. You can easily add 
p| dazzling 3-D effects and sophisticated animation to any ... 
|p 62% hrtp://3dsitexom/3dsite/cgi/softwa^ (Size6.9K) 



Rainbow Video Authoring Services 

CD-ROM Authoring and Web Development Services. Not satisfied with offering the best in regional video production, 
Rainbow Video offers complete CD-ROM authoring services for both the ... 
61 % http://wwwjainbowvidco.com/author.htm {Size 2.7K) 



Bio 

Bio [TOC Icon][Feedback Icon] photo by Wink Van Kempen Fred Stern The Rainbow Maker Fred Stem was raised in New 
York and is an acknowledged innovator in environmental art He has ... 
6 1 % hitp://www^aneLcomAainbow/bio.htm (Size 4. 1 K) 

Oregon Department of Fish and Wildlife Weekly Fishing Repo rt 

Oregon Department of Fish and Wildlife. Now sorted by zones!. Updated: August 7, 1997. * Denotes scheduled stocking 
Zones. Northwest Zone I Southwest Zone I Willamette Zone 1 ... 



60% http^/www.dfw.staic.or.us/ODFWhtral/RccRepons/Hshine.htrd (Size 40.5 K) 
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Figure 25. Response of Infoseek for the query How is a rainbow created? 
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Find: what is a mealy machine? 




Hits: jioo j»J Context] 2 oo 2|* Cluster: po' jgj f racking: |no jrj" 



Searching for: "mealy machine is" "mealy machine refers to' ? "mealy machine means" "mealy machine will " 
"mealy machine helps" using: HotBot Infoseek AltaVista Excite Lycos Northern Light Yahoo WebCrawler . 

Tip: For better precision with multiple terms you might like to use V to ensure that the results contain specific terms (e.g. 

':U'Lv1 : : ; ' ; + M fee giles" +optics). . ■ y : :\: i 

Ref:...L such that a!i state memory changes arc made with respect to the clock signal. TFA Moore machine usually 
has less states than an equivalent Mealy machine. TFA potential problem with a Mealy Machine is that the output 
changes are not synchronized with clock changes. Fill in the blanks. 10 points at 2 points per blank The canonical SOP 
form of an expression results in a level circuit. ... 

Ref:... input alphabet and by creating multiple input mechanisms for reading events. Second, the transition function 
must be modified so that controller tasks can be performed during state transitions. A Mealy machine is a DFA that 
defines symbols which are output during state transitions. For the current purpose, a similar mechanism is used to 
perform controller tasks such as moving a robot, opening a vise of fi... 

Ref:... these general premises to the Collatz conjecture, which of all open problems at the moment is perhaps the most 
conveniently conducive to the approach. = Generalized Sequential Machines, GSMs A Mealy machine is a Finite 
State Automaton with a single output symbol associated with each state transition (e.g. see [ 1 . p.42]). A GSM or 
Generalized Sequential Machine is similar it is a PSA with an output strin... 

Ref:... next state which then effect the output. (State refers to all latched events and values.) Argument for Mealy is 
that the output depends on the transition, thus ignoring the buffers, the CFSM is a Mealy machine. (Will explore this 
more later.) Issues concerning compostion have not been resolved by the Polis group, there is no composition as it 
stands. Resources A Formal Methodology for Hardware/Softwar... 

Ref:... Suite Machines We consider two types of state machines, Moore and Mealy. A Moore machine is a Mealy 
machine whose output does not directly depend on its input. Mealy Machines A Mealy Machine is a 6-tuple M*= ( S, 

Q, q_0, D(a,q), l(a,q) ) where S != 0 is a finite set of input symbols t we will use a to denote a particular input 
symbol) D != 0 is a finite set... 

Ref:...ving on state register flip-flops, it is still desirable to use them. This leads to alternative synchronous design 
styles for Mealy machines. Simply stated, the way to construct a synchronous Mealy machine is to break the direct 
connection between inputs and outputs by introducing storage elements. One way to do this is to synchronize the 
Mealy machine outputs with output flip-flops. See Figure 8... 

Ref:...itions. A FSA is called non-deterministic if there is one or more transitions from one state to another for a given 
input. A Moore machine is an FSA which associates an output with each state and a Mealy machine is an FSA which 
associates an output with each transition. The Moore and Mealy FSAs are important in applications of FSAs. 
Equivalence of deterministic and non-deterministic fsa It might seem ... 

Ref:...icle. and you, will make use rj'f this three-block model to describe a state machine in VHDL using our four-step 
design procedure. Moreover, the outputs of a state machine define its type. That is, a Mealy machine is one in which 
the outputs are a function of both the inputs and the current state-variables (Figure )). A Moore machine has outputs 
thai are a function of the state-variables only (Figure 2 k And a M... 

Ref:... 60 Bestudeer van module I . J de blz.7 t/m biz.. ] 3 grondig. 20 Als u met (of met incer zo goed) weet wat een 
toestandsmachine, een toestandsdiagram, een Moore- of een Mealy-machine is, zoek dan wat u niet (meer) weet op 
in uw boek(en) over diguale teehntek. 30. Maak de oefenopdracht van biz. 1 3 1 0 Lees de rest van module 1 . 1 
oppervlukkig ... 

[ . . . section deleted. . . ] 



Figure 26. NECI meta search engine response for the query What is a mealy machine? 
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Find: f~ ZZTT ~ ~ _ ^_ _ ^7"^"" ^ 

Hits: | so jrj Context: |ioo j|j Cluster Jno ^Tracking: |ncT £] ^ " "*^"" ——■•-* 



Tip: Clicking on the search engine links in the "Searching for:* line will show the search engine response to the current query. 

y ^ - Recently modified URLs: 

Page ftp://ftp.kernd.org/pub/Iinux/kernel/testing/ [Stop tracking URL] 

Recent documents matching: signafy [Mark as seen] fStop tracking query] 

INFORMIX-Universal Server Powers Comprehensive Media Asset Management Package From Virage; ... NW n/a 12k 
http://wwwjnfoseek.com 

... Group The Cement Group. Excaltbur Technologies Corp. Muscle Fish, LLC Silicon Graphics, Inc. Signafy , Inc. 
TECHMATH GmbH and TATA Consultancy Services. About INFORMIX-Univcrsal Server INFORMIX-Un... 

INFORM IX-Universal Server Powers Comprehensive Media Asset Management II n/a 13k 

http7AvwvAinfoseek.com/Content?arn=ix.BWIR1 997091 521 5251 76X&qt=signafy&co!=lX&nh=25&kt=A&ak=tndustrynews 
...roup The Content Group. Excalihur Technologies Corp. Muscle Fish, LLC Silicon Graphics, Inc. Signafy , Inc. 
TECHMATH GmbH and TATA Consultancy Services. About INFORMIX-Universal Server I... 





. ■ Today at NECI 




j NJ jFVog. Language Workshop - Multipurpose Rooms 2F00 2F01, 2nd Floor, (AW) : ^ ^ r : - r 


OUT 


Ebbesen, Gottlieb, de Ruyter, Thornber v - : 0; - : r ^ 



Recent articles about NEC Research in the press: 



NEC Research Promises Terabit Memory Chips NT n/a 20k http://www.techwebxom:80/wire/news/1 997/09/09 1 1 nec.htmi 

JVEC Research Promises Terabit Memory Chips ... /... NEC Research Promises Terabit Memory Chips ... /... Chips 
International NEC Research Promises Terabit Memory Chips (09/1 1/97 12:00 p.m. EDT) By John Boyd , ... 

[...section deleted...] 



Figure 27. Sample home page showing new hits for a query and recently modified URLs. 



http^/www.research/digitalxom/SRC/publication New text: 

1. Paul Mcjones and John DeTreville. Each to Each programmer's reference manual. Technical Note 1997- 
023, Digital Equipment Corporation Systems Research Center, Palo Alto, CA, October 1997, 



Find People 



SRC Publications List 



1. Paul Mc Jones and John DeTreville. Each to Each programmer's reference manual . Technical Note 1997-023, 
Digital Equipment Corporation Systems Research Center, Palo Alto, CA, October 1997. 

2. Monika Henzinger and Han La Poutre. Certificates and fast algorithms for biconnectivitv in fullv-dvnamic 
graphs. Technical Note 1997-021, Digital Equipment Corporation Systems Research Center, Palo Alto, CA, 
September 1997. 

3. Monika Henzinger. Improved data structures for fully dynamic biconnectivitv . Technical Note 1997-020, Digital 
Equipment Corporation Systems Research Center, Palo Alto, CA, September 1997. 

4. Monika Henzinger and Valerie King. Maintaining minimum spanning trees in dynamic graphs . Technical Note 
1997-019, Digital Equipment Corporation Systems Research Center, Palo Alto, CA, September 1997. 

5. Marc Brown, Marc A. Najork, and Roope Raisamo. A Java-based implementation of Collaborative Active 
Textbooks. In 1997 IEEE Symposium on Visual Languages, pages 372-379. IEEE Computer Society, 
September 1997. (PDF) , (PostScript) . (Copyright 1997 IEEE) . 

[..♦section deleted...] 



Pis ital Systems Research C enter , , 

■ — ^ Legal notice 



^gj^ Send commen ts to the owner of this page. 



130 Lytton Avenue, Palo Alto, CA 94301 Last modified; Tuesday" 07 -Oct-97 10:32:46 PDT 

Tel: (415) 853-2100 Fax: (415) 853-2104 

Copyright Digital Equipment Corporation 1995-1997. All Rights Reserved. 



Figure 28. Sample page view showing the text which has been added to the page since the last time it was viewed. 
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Figure 29. Coverage of each engine with respect to the combined coverage of all 6 (averaged over 500 queries). 



Fraction of Results versus Number of Search Engines 
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Figure 30. Coverage as the number of search engines is increased (averaged over 500 queries). The extrapolation is 
created using the assumption that the coverage increases logarithmically with the number of search engines. Signifi- 
cantly more documents are returned as the number of search engines is increased. 




Figure 31. In order to estimate the size of the indexable Web (the Web excluding pages not considered by the search 
engines), we compare the overlap between engines to the number of documents returned from all 6 engines combined. 
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Figure 32. Coverage of each engine with respect to the estimated size of the indexable Web (the estimate is expected 
to be lower than the true value). 
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Figure 33. Histograms of the major search engine response times, continued in the next figure. 
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Lycos Response Time 
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Figure 34. (Continued from the previous figure) Histograms of the major search engine response times (above, in 
seconds), and a histogram of the response time for the first response when queries are made to the six engines simul- 
taneously (below, created from 10,000 samples drawn from the previous distributions). The frequency is normalized 
so that it represents the percentage of responses that fall within each section of the histogram. The last section of the 
histograms also contains all samples with longer times. 
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Figure 36. Median time for the first of n Web search engines to respond. 
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Figure 37. Response time for arbitrary Web pages. 
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Figure 38. Median time to download the first of n pages requested simultaneously. 
Response Time for the First Result from the Meta Engine 
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Figure 39. Time for the meta engine to display the first result. 
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Declaration and Power of Attorney For Patent Application 

English Language Declaration 



A$ a below named inventor I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name, 

I believe I am the original, first and sole inventor (if only one name is listed below) or an original, 
first and joint inventor (if plural names are listed below) of the subject matter which is claimed and for 
which a patent is sought on the invention entitled 

META SEARCH ENGINE 

ffie specification of which 

Rebeck one) 

is attached hereto. 

was filed on as United States Application No. or PCT International 

Application Number ^ 

^ and was amended on _ 

i2 (if applicable) 

thereby state that I have reviewed and understand the contents of the above identified specification, 
Including the claims, as amended by any amendment referred to above. 

I acknowledge the doty to disclose to the United States Patent and Trademark Office all information 
known to me to be material to patentability as defined in Title 37, Code of Federal Regulations, 
Section 1 .56. 

I hereby claim foreign priority benefits under Title 35, United States Code, Section 119(a)-(d) or 
Section 365(b) of any foreign jipplication(s) for patent or inventor's certificate, or Section 365(a) of 
any PCT International application which designated at least one country other than the United States, 
listed below and have also identified below, by checking the box, any foreign application for patent or 
inventor's certificate or PCT International application having a filing date before that of the application 
on which priority is claimed. 

Prior Foreign Application(s) Priority Not Claimed 



□ 

(Number) (Country) (Day/Mo nth/Year Filed) 

Q 

(Number) (Country) (Day/Month/Year Filed) 

□ 

(Number) (Country) (Day/Month/Year Filed) 



Form PTO-SB^I (9-05) (Modified) 



PO2/REV02 



Patent and Trademark Offics-U-S. DEPARTMENT OF COMMERCE 
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r ' ; 

"I "hereby claim the benefit under 35 U.S.C. Section 119(e) of any United States provisional 
application(s) listed below: 



60/062,958 October 10, 1997 



(Application Serial No.) 


(Filing Date) 


(Application Serial No.) 


(Filing Date) 


(Application Serial No.) 


(Filing Date) 



I hereby claim the benefit under 35 U. S. C. Section 120 of any United States applications), or 
Section 365(c) of any PCT International application designating the United States, listed below and, 
fflnsofar as the subject matter of each of the claims of this application is not disclosed in the prior 
United States or PCT International application in the manner provided by the first paragraph of 35 
H&J.S.C. Section 112, I acknowledge the duty to disclose to the United States Patent and Trademark 
fpffice all information known to me to be material to patentability as defined in Title 37, C. F. R., 
^Section 1 .56 which became available between the filing date of the prior application and the national 
/jjr PCT International filing date of this application: 



Q (Application Serial No.) 


(Filing Date) 


(Status) 






(patented, pending, abandoned) 


Q (Application Serial No.) 


(Filing Date) 


(Status) 






(patented, pending, abandoned) 


(Application Serial No.) 


(Filing Date) 


(Status) 




(patented, pending, abandoned) 



I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on Information and belief are believed to be true; and further that these statements 
were made with the knowledge that willful false statements and the like so made are punishable by 
fine or imprisonment, or both, under Section 1001 of Title 18 of the United States Code and that such 
willful false statements may jeopardize the validity of the application or any patent issued thereon. 



Form PTO-SB-01 (6-95) (Modified) 



Patent and Trademark Offi«-U.$. DEPARTMENT OF COMMERCE 
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POWER OF ATTORNEY; As a named inventor, I hereby appoint the following attorney(s) and/or 
agent(s) to prosecute this application and transact all business in the Patent and Trademark Office 
connected therewith, (list name and registration number) 

Stephen D- Murphy; Reg. No. 22,002 Paul J. Esatto, Jr.; Reg, No. 30,749 

Leopold Presser; Reg- No, 19 fill John S. Sensny; Reg- No. 23,757 

William C- Rocn; Reg. No, 24,972 Mark J- Coben; Reg. No. 32,211 

Kenneth L. King; Reg. No, 24,223 Richard L. Catania ; Reg. No. 32,608 

Frank S. DiGfgHo; Reg, No. 31,346 Donald T, Black; Reg, No, 27,999 

Philip J, Feig; Reg. No. 27,328 
Andrew G- Isztwan, Reg. No. 40,028 



Send Correspondence to; Paul Esatt0 ' ,n > Es * 

4 Scully, Scott, Mu rphy & Presser 

f 400 Garden City Plaza 

r Garden City, New York 11530 

Ejrect Telephone Calls to: (name and totephone number) 
Piail J, Esatto, Jr. Esq. at (516) 742^4343 



Full name of sole or lirst inventor 
Stephen R. Lawrence 




_ Date 



qrk, Kev York 



;: Citizenship 
Australia 



Post Office Address 

235 ffest 48th Street, Ap artment 11B 



N&v York, Key ..York 10036 



Full name of second inventor, ft any 




C. Lee Gflei 




Second 'nvemoc's signature 




c 




Residence ~ 




Lawrenceville, New Jersey 0664$ 




Citizenship 




United States of America 




Post Office Address 




37 WoodUne Road 




LawrenceviUe, New Jersey OS 64$ 
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